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THE  PROBLEM 

The  shimmering  effects  of  atmospheric  turbulence  are  familiar  to  anyone  who  has  looked 
across  the  room  through  the  air  rising  from  a  hot  radiator,  or  at  a  distant  object  across  a 
desert  in  the  summertime.  The  image  appears  to  be  randomly  distorted  and  broken  up 
into  characteristic  ‘speckles’.  This  is  a  real  effect,  not  an  optical  illusion,  so  that  any 
picture  taken  through  the  turbulence  with  a  camera  will  suffer  the  same  problem.  We 
have  studied  how  to  reduce  these  turbulence  effects  on  such  camera  images  after  they  are 
taken.  A  special-purpose  digital  processing  algorithm  is  used  called  the  ‘image  division’ 
method. 

BASIC  ALGORITHM 

The  processing  approach  is  based  upon  the  use  of  two  short-exposure  images,  i.e.  with 
exposure  times  the  order  of  .01s  or  less.  Also,  the  images  are  taken  so  far  apart  in  time  as 
to  ‘sec*  independent  turbulence  distributions  in  the  optical  pupil  of  the  camera.  The  two 
images  are  Fourier  transformed,  and  these  are  divided  one  by  the  other  (thus,  ‘image 
division’).  This  produces  the  basic  data  Dn,  n-  1,..  where  N  is  the  number  of 
frequencies.  The  data  relate  to  the  two  unknown  PSFs  (point  spread  functions)  s^KsJ2* 
via  Eq.  (12)  of  Ref.  [1],  (Note:  See  first  appendix  at  end  of  report.)  These  are  solved 
for,  along  with  the  two  object  estimates  On,oK(2)  via  the  recursive  algorithm  shown  in 
Fig.  1  of  Ref.  [2]  (second  appendix  at  end  of  report). 


TESTS  OF  THE  ALGORITHM 

The  algorithm  has  been  extensively  tested  on  computer  simulations  [2].  More  recently,  it 
has  been  tested  on  infrared  imagery  provided  by  Kitt  Peak  National  Observatory  (K. 
Hege,  J.Garcia).  All  results  are  positive.  The  method  works. 
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We  have  just  obtained  stellar  speckle  imagery  from  the  Univ.  of  New  Mexico,  and  are 
about  to  test  the  algorithm  on  this  as  well 

An  interesting  result  of  these  tests  is  that  the  ‘tails’  of  the  image  contain  significant 
information  about  the  unknown  PSFs.  Hence  the  imagery  must  not  be  spatially  truncated 
while  still  at  significant  values  above  background.  If  these  are  missing,  the  algorithm  is 
rrntWi  into  ‘thinking*  that  there  is  a  missing  source  located  somewhere  outside  the 
truncated  field.  Hence  it  does  not  converge. 

CONCLUSIONS 

The  image  division  method  is  a  working  method  for  effectively  and  speedily  reducing  the 
effects  of  random  turbulence  upon  given  images.  All  processing  may  be  done  on  a 
personal  computer  (as  was  the  case  for  the  testing  above).  No  large  mainframe  computer 
is  needed.  The  algorithm  could  conceivably  be  implemented  via  computer  chip  in  a 
hand-held  ‘smart’  camera.  The  camera  simply  takes  two  successive  images  through  the 
turbulence  each  time  the  shutter  is  activated  The  two  images  are  then  processed  by  the 
image  division  algorithm.  An  FFT  (Fast  Fourier  transform)  chop  can  be  used  to 
implement  all  needed  Fourier  transforms,  and  the  output  image  digitally  viewed  on  a 
liquid  crystal  display. 

Another  possibly  useful  feature  of  the  approach  is  an  outgrowth  of  finding  the  two  PSFs. 
These  should  give  information  about  the  presence  of  wind  shear.  High  winds  ought  to 
cause  highly  speckled  PSFs.  Hence  the  approach  could  potentially  be  used  to  search  for 
wind  shear  conditions  along  an  imaging  path.  Since  the  approach  is  passive,  not 
requiring  active  probes  such  as  via  lidar  or  radar,  it  is  simple,  cheap  and  robust.  Further 
testing  can  ascertain  how  effective  it  is  versus  these  active  approaches. 
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Abstract 

We  show  how,  in  principle,  to  solve  the  ‘blind  deconvolution  *  problem.  This  is  in  the  context  of  the  problem  of  imaging 
through  atmospheric  turbulence.  The  approach  is  digital  but  not  iterative,  and  requires  as  input  data  but  two  short-exposure 
intensity  images,  without  the  need  for  reference  point  sources.  By  taking  the  Fourier  transform  of  each  image  and  dividing,  a 
set  of  linear  equations  is  generated  whose  unknowns  are  sampled  values  of  the  two  random  point  spread  functions  that 
degraded  the  images.  An  oversampling  by  50%  in  Fourier  space  equalizes  the  number  of  unknowns  and  independent 
equations.  With  some  prior  knowledge  of  spread  function  support,  and  in  the  absence  of  added  noise  of  image  detection,  the 
inverted  equations  give  exact  solutions.  The  two  observed  images  are  then  inverse  filtered  to  reconstruct  the  object  O  1998 
Elsevier  Science  B.V.  All  rights  reserved. 


1.  Introduction 

We  first  review  image  sampling  theory  as  it  will  be 
used  in  the  processing  approach.  A  knowledge  of  the 
fundamentals  of  incoherent  image  formation  as  described, 
e.g.  in  Ref.  [l],  is  assumed.  For  simplicity,  we  use  one-di¬ 
mensional  notation.  The  theory  is  trivially  extended  to  two 
dimensions  (e.g.  replacing  all  one-dimensional  Fourier 
sums  and  integrals  with  their  two-dimensional  counter¬ 
parts).  The  simulations  at  the  end  are  two  dimensional,  as 
are  real  images. 

2.  Sampling  theorems 

Consider  an  incoherent  object  of  limited  extension  2x0. 
This  has  an  intensity  profile  o(x),  with  x  the  position 
coordinate,  and 

o(x)«0  for|x|^x0.  (1) 

The  object  is  imaged,  via  a  turbulent  atmosphere,  through 


*  E-mail:  friedenr@super.arizona.edu 


a  lens  system.  Although  the  image  i(x)  has,  theoretically, 
infinite  extension,  in  practice  it  has  finite  extension,  since 
beyond  a  finite  position  all  intensity  values  are  insignifi¬ 
cantly  small.  For  convenience,  we  place  it  along  the 
positive  x-axis,  as  shown  in  Fig.  1.  The  Fourier  transform 
of  the  image,  the  spectrum  7(cu),  is  sketched  on  the  right 
(spatial  frequency  o>  is  in  units  of  radians /length).  For 
simplicity,  only  the  real  part  of  7(o>)  is  shown.  Note  the 
cutoff  for  positive  and  negative  frequencies  at  ±J2. 

The  relation  between  i(x)  and  7(<u)  is,  of  course, 

7(w)-/  dxi(x)e”;"*,  j  -  /  -  1  , 

Jo 

(2) 

As  is  well  known,  the  finite  cutoff  frequency  O  allows  us 
to  use,  with  negligible  error,  a  discrete  sum 

M-  1 

7(«)-  E  Axi(mAx)e^"^,  (3) 

«— o 

where 

A  x-ir/fl,  (4) 

in  place  of  the  integral  form  Eq.  (2).  Quantity  A  x  is  called 
the  sampling  interval,  since  it  is  the  constant  spacing 
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i (x)  X{«) 


Fig.  1.  Image  i(x)  and  its  spectrum  7(«)  (reaJ  part  shown). 


between  sampled  image  values  i(mAjc),  m  -0, 1, M 
-  1. 

Because  of  the  discrete  nature  of  the  sum  in  Eq.  (3)  the 
spectrum  replicates  periodically,  /(  at  +  O  )  *  /(  u>\  as 
shown  in  Fig.  2.  Note  that  the  interval  R  contains  all  of 
the  spectral  curve  7(<t>),  albeit  in  a  rearranged  order  (i.e. 
first  the  positive  portion,  then  the  negative  portion).  This 
permits  us  to  use,  then,  as  the  spectrum 

M-  I 

/(*)=  £  «m3/(«Ax). 

Jti  “  0 

/},  (5) 

in  place  of  Eq.  (3). 

The  computed  spectrum  via  Eq.  (5)  can,  in  principle,  be 
evaluated  at  a  continuum  of  frequency  values  over  the 
given  interval.  But,  for  computational  purposes,  we  evalu¬ 
ate  it  at  a  fme,  discrete  subdivision 


n,  (6) 

N 

These  discrete  values  all  lie  within  the  interval  (0,2/7) 
prescribed  in  Eq.  (5).  The  size  of  N  governs  the  fineness 
of  the  frequency  sampling.  Notice  that  N  can,  in  fact,  be 
taken  as  large  as  is  desired.  We  use  this  fact  in  the 
processing  method  below. 

By  Eqs.  (4)  and  (6),  the  exponent  in  Eq.  (5)  becomes 


2/i  tmt 
2— 


2irjmn 
N  ‘ 


(7) 


J(u) 


Fig.  2.  Replicated  image  spectrum. 


Then  the  sampled  Eq.  (5)  becomes 

M-  1 

/(«„)■/„  -Ax  £  „.01 . tf_l. 

m  -  0 

(8) 

This  is  in  the  form  of  a  discrete  Fourier  transform  (DFT), 
but  with  a  difference:  there  is  a  generally  different  number 
N  of  output  values  in  frequency  space  than  the  M  input 
values  in  direct  space.  Our  ‘division  method*  of  turbulence 
processing,  developed  below,  hinges  upon  the  use  of  cases 
where  N  >  M  by  appreciable  amounts. 


3.  Image  turbulence  problem 


We  now  describe,  in  detail,  the  image  turbulence  prob¬ 
lem  [2-4]  at  hand.  Consider  a  single  object  that  is  imaged 
twice  in  succession  through  random  atmospheric  turbu¬ 
lence.  The  images  are  of  short-exposure  duration,  the  order 
of  1  /60  s  or  less,  but  are  separated  in  time  by  more  than  a 
short-exposure  duration,  say  3/60  s.  The  images,  then, 
‘see*  two  independent  turbulent  phase  distributions  across 
the  optical  pupil. 

The  object  has  an  unknown  spectrum  0((d„)  =  0„,  n  - 

0,1 . N—  1.  Likewise,  the  two  short-exposure  images 

are  degraded  by  two  unknown,  random  optical  transfer 
functions  r(,,(<urt)  =  t„(,), r(2,(  <*>,,)  s  Trt(2);  n  «  0, 1,  N 
-  1.  This  might  be  called  a  ‘triply  blind*  problem,  in  view 
of  the  three  sets  of  unknowns.  The  two  observed  intensity 
images  give  rise  to  two  known  image  spectra  7*°, lf?\  n  ** 
0,1,...,  Af  —  1  as  computed  via  Eq.  (8).  The  image,  object 
and  transfer  function  spectra  are  connected  by  a  transfer 
theorem  [1] 

i-U;  n  *  0,1 . N-  1.  (9) 

(For  the  time  being,  we  ignore  added  noise  of  detection.) 
This  allows  us  to  remove  one  set  of  unknowns  from  the 
problem,  as  follows.  The  approach  is  called  the  ‘image 
division*  method. 
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4.  Image  division  method 


Recall  that  the  image  spectra  /(,\/<2)  are  known  as 
secondary  data  via  Eq.  (8)  from  the  primary  intensity  data 
i(,\  i<2\  Divide  the  two  image  spectra,  to  form  new  data 

/<■>  r<"om  7<n 

. *-»•  <10> 

*n  * n  *n 

The  unknowns  0„  have  dropped  out  As  in  Eq.  (8), 
r(,)  and  r(2>  relate  to  unknown  short-exposure  point  spread 
functions  (PSFs)  s(,)and  r(2)  via  sampling  expressions 
m-  i 


r<‘>  , 


Ax  Y, 


~2w  jmn  /  ft 


1,2. 


(H) 


Substituting  these  into  Eq.  (10)  gives  our  working  expres¬ 
sion 


rr.-c' 


n-  0,1 . AT-1.  (12) 


"  yM-  I  r(2)_- Zvjmn/N  ' 

-  0  ^  m  c 

The  left-hand  side  is  known,  as  data  Dn.  The  PSFs  j(I),  J2) 
are  the  unknowns.  Regarded  as  a  computational  problem, 
Eq.  (12)  represents  N  equations  in  2  M  unknowns.  Recall¬ 
ing  that  N  can  be  made  as  large  as  we  wish,  can  it  be 
made  large  enough  compared  to  2Af  such  that  the  equa¬ 
tions  give  a  solution  for  the  PSFs? 

To  see  what  is  really  going  on,  cross-multiply  Eq.  (12) 
and  bring  everything  over  to  one  side.  This  gives  a  set  of 
N  equations 

E  (jJ^’  -  *=  0,  n  =  0,1 . AT- 1. 

m  -  0 

(13) 

These  are,  in  fact,  linear  in  the  unknowns  s(,),  rf2).  This 
offers  hope  for  a  closed-form,  exact  solution  to  the  prob¬ 
lem.  Such  a  solution  is  found  as  follows. 

Recall  that  the  data  values  are  the  Dn.  These  are 
generally  complex.  Represent  them  as 

(14) 

where  Mn  and  are  known  modulus  and  phase  values. 
(The  factor  2tt/N  is  there  for  later  convenience.) 

Eqs.  (13)  are  complex.  The  real  parts  give  rise  to  one 
set  of  equations,  the  imaginary  parts  to  another.  Using  Eq. 
(14),  these  are 

Repart:  E iil’cos^ ~  j 

-Lam 

m 

Impart:  E*«’s«i(~-j 
+  E  “  mn) 


(2,cos  ^(<*>„-/wi) 


■0, 


■0. 


(15) 


These  are  2/9  linear  equations  in  the  2  M  unknowns 

j(l),/2>. 

One  hurdle  that  has  to  be  overcome  is  that  the  preced¬ 
ing  equations  are  homogeneous  (right-hand  sides  are  0). 
Then  the  direct  solution  for  the  case  N  -  M  would  be  that 
all  PSF  values  are  zero.  This  violates  normalization  and  is 
not,  of  course,  a  physically  useful  solution.  It  is  better  to 
work  with  inhomogeneous  linear  equations.  Luckily,  the 
equations  may  be  made  inhomogeneous  as  follows. 

The  quotient  form  of  the  right-hand  side  of  data  Eq. 
(12)  shows  that  the  data  Dn  are  only  sensitive  to  the  ratios 
of  PSF  values.  Consequently,  knowledge  of  the  Dn  only 
permits  the  s(l),  s(2)  to  be  known  to  an  arbitrary  multiplica¬ 
tive  constant  This  is  acceptable,  since  if  the  PSFs  are 
initially  off  by  such  a  factor  they  are  immediately  cor¬ 
rectable  by  normalization.  (This  models  each  image,  in  the 
usual  way,  as  being  a  redistribution  of  the  fixed  amount  of 
light  energy  that  is  present  in  the  object) 

We  take  advantage  of  such  an  arbitrary  multiplicative 
factor  in  the  following  way.  Any  one  of  the  PSF  values 
may  be  arbitrarily  set  equal  to  1  (say),  and  the  data 
equations  (15)  must  then  give,  when  solved  for  the  other 
PSF  values,  numbers  that  are  correctly  ratioed  to  the  1 .  For 
example,  if  we  set  sty/2  *  1  whereas  its  actual  correct 
value  is  0.50  then  if  the  correct  value  of  (say)  j£2)  -  0.7 
the  solution  to  Eq.  (15)  would  give  -  1.4.  The  only 
problem  such  a  procedure  could  encounter  is  to  set  the 
value  1  equal  to  a  PSF  value  which,  in  reality,  is  0.  There 
is  no  useable  multiplier  of  0  that  brings  it  up  to  a  finite 
number.  Similarly,  if  the  real  PSF  value  is  small,  say 
0.0001,  then  setting  it  equal  to  1  would  give  as  solution  for 
the  other  PSF  values  very  large  numbers.  Such  a  solution 
could  be  unstable  in  the  presence  of  small  errors  in  the 
input  image  data. 

To  avoid  the  latter  problems,  we  equate  to  1  a  PSF 
value  which  is  least  likely  to  be  0.  Such  a  one  would  be  at 
the  center  of  the  PSF,  i.e.  J)/2J  (notation:  [M/2]  -  M/2 
for  M  even  or  (M  —  l)/2  for  M  odd).  The  energy  should 
tend  to  be  maximal  there.  Doing  this  changes  the  homoge¬ 
neous  set  of  equations  (15)  into  the  inhomogeneous  set 


E 

m  +  lM/2) 


J^COS 


(2t rmn  \  f  2ir 

—jj—  j  -  E^m’cos  — ( 4>„~mn ) 


2ir[M/2]n\ 

N  J* 


for  n 


0.1.. ...AT- 1, 


E 

m  #  { M/2) 


(2t rmn  \ 

— J 


+  Zk 


4J)sin 


2*r 

—  (<*>„- m/i) 


2ir[Af/2]n) 

*  r 


for  n  -  1,2, 


,N—  1. 


(16) 


(The  second-set  equation  for  n  «  0  is  the  tautology  0-0 
and,  so,  is  skipped.) 
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Before  attempting  solution,  it  is  necessary  to  delete  all 
equations  which  are  linear  multiples  of  others.  Including 
them  ruins  any  matrix  inversion  approach  to  the  problem. 
It  turns  out  that,  if  one  uses  N  *  2Af  equations,  then  the 
first  M  -  1  equations  and  the  last  Af  +  1  equations  are 
linearly  independent  and  may  be  used  to  form  the  solution 
set.  We  found,  by  simulation,  that  the  method  works.  The 
solution  r(I),  s(2)  incurs  no  error . 

The  preceding  paragraph  applies  specifically  to  one-di¬ 
mensional  (1  X  M )  images.  For  the  corresponding  MX  M 
problem  in  two  dimensions  it  was  found  that  an  NXS 
matrix  of  frequencies  allowed  for  a  unique  solution  if 
1.5Af,  i.e.  an  oversampling  by  50%  in  each  direction 
of  frequency  space. 


5.  Demonstrations 

To  easily  judge  reconstruction  quality,  we  worked  with 
a  simple  object  -  a  large  figure  X  of  intensity  level  M  - 
within  an  M  X  M  image  field  of  size  M  *  16.  To  facilitate 
pixel-by-pixel  comparisons  of  ground  truth  and  recon¬ 
structed  images,  we  display  each  pixel  as  an  easily  recog¬ 
nized  alphanumeric  symbol.  These  cover  the  10  intensity 
levels  (blank).:  /  %  $  Z  #  @  M.  respectively.  Small  field 
size  M  was  also  dictated  by  the  fact  that  any  solution  to 
the  (two-dimensional)  problem  requires  inversion  of  a 


matrix  of  size  2Af 2  X  2 Af2,  which  rapidly  increases  with 
M  and  is  already  of  size  512  x  512  for  our  problem. 

The  PSFs  were  randomly  generated  in  accordance  with 
a  Kolmogoroff  power  spectrum  [2].  For  each  problem,  a 
pair  of  K>Fs  were  so  generated,  convolved  with  the  X- 
shaped  object  to  form  the  two  images,  and  then  processed 
by  the  image  division  method.  In  the  first  demonstration, 
no  noise  of  image  detection  was  added.  Fig.  3  shows  the 
true  PSFs  in  the  left-hand  column,  and  their  reconstruc¬ 
tions,  by  the  image  division  method,  in  the  right-hand 
column.  A  pixel-by-pixel  comparison  shows  that  the  re¬ 
constructions  are  nearly  exact  (to  the  10  gray  level  scale 
used).  Any  errors  are  believed  due  to  round-off  during  the 
calculation,  which  was  done  in  single  precision  arithmetic. 

With  the  PSFs  reconstructed,  each  was  Fourier  trans¬ 
formed  to  give  a  reconstructed  transfer  function  rm.  Using 
Eq.  (9),  each  was  then  divided  into  its  corresponding 
image  spectrum  to  form  an  estimated  object  spectrum.  This 
procedure  is  called  ‘inverse  filtering'.  The  object  spectra 
were  then  Fourier  transformed  back  into  x-space  to  give 
the  reconstructed  objects.  Fig.  4  shows  each  of  the  two 
simulated  images  in  side-by-side  comparison  with  its  re¬ 
constructed  object  by  the  inverse-filtering  procedure.  It  is 
seen  that  the  reconstructions  are  much  improved  over  their 
corresponding  images  and,  in  fact,  are  very  good  in  abso¬ 
lute  terms.  In  an  effort  to  find  why  the  approach  works,  we 
empirically  varied  the  support  widths  of  the  PSFs.  In  those 
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Fig.  3.  PSFl,  PSF2  and  their  reconstructions. 
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Fig.  4.  Turbulent  images  and  their  reconstructions  (no  added  noise  of  detection). 
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cases  where  the  two  PSFs  extend  nearly  to  the  edge  of  the 
field,  it  was  found  that  the  system  of  equations  (16)  has  a 
well-defined  inverse,  so  that  the  method  gives  perfect  or 
near-perfect  results  (as  above).  Such  cases  actually  repre¬ 
sent  a  situation  of  strong  prior  knowledge  of  the  support 
(region  of  non-zero  values)  of  the  PSFs.  That  is,  the 
support  is  known  to  be  approximately  the  size  of  the  field. 
Here  one  can  say  that  the  field  size  is  a  ‘good  (least) 
upper  bound  to  PSF  support 

We  found,  further,  that  a  range  of  PSF  supports  are 
well-accommodated  by  the  approach.  As  long  as  PSF 
values  of  about  1%  of  the  maximum  or  more  exist  near  the 
edge  of  the  field,  then  the  field  size  represents  strong 
enough  prior  knowledge  of  PSF  support  to  give  nearly 
perfect  reconstructions  (assuming  zero  additive  noise  of 
detection). 

But  in  cases  where  the  PSF  values  near  the  field  edge 
are  significantly  less  than  1%  of  the  maximum  value,  the 
PSFs  are  contracted  in  support  toward  the  center  of  the 
field  by  significant,  and  unknown,  amounts.  Then  the 
known  field  size  no  longer  represents  a  ‘good’  upper 
bound  to  the  PSF  support.  In  these  cases,  the  resulting 
system  of  equations  (16)  turns  out  to  have  a  poorly  defined 
inverse;  no  solution  is  forthcoming  for  the  PSFs  using 
single-precision  arithmetic. 

From  this  we  conclude  that  good  prior  knowledge  of 
PSF  support  is  important  to  effective  use  of  the  approach. 
In  fact,  PSF  support  can  be  computed  from  prior  knowl¬ 
edge  of  the  overall  strength  of  the  turbulence  (as  measured 
by  the  Fried  r0  extension  parameter  [5,2],  for  example). 
Support  knowledge  is  often  a  strong  aid  in  the  estimation 
of  PSFs.  For  example,  knowledge  of  the  support  for  the 
Fourier  transform  of  the  PSFs,  the  OTFs  t**\t*2\  are  a 
vital  part  of  the  blind  deconvolution  approach  of  Holmes 
(6]. 

In  the  preceding  demonstration,  although  the  two  PSFs 
are  random  there  was  no  added  randomness  due  to  noise  of 
image  detection.  To  test  the  sensitivity  of  the  approach  to 
such  additional  noise,  we  added  1%  additive,  Gaussian 
noise  to  both  images  and  re-processed  them.  The  result 
was  a  significant  deterioration  in  the  quality  of  the  PSF 
reconstructions  (not  shown  for  brevity),  but  not  enough  to 
wipe  out  reconstructed  object  details.  Fig.  5  shows  the 
turbulent  and  (now)  noisy  images  for  this  case,  on  the  left, 
along  with  their  two  corresponding  object  reconstructions 
on  the  right  The  X  shape  of  the  object  is  still  clearly 
visible  in  both  reconstructions. 


6.  Discussion 

As  was  seen,  the  image  division  method,  as  currently 
used,  works  within  a  domain  of  1%,  or  less,  noise  of 
detection.  However,  as  with  other  poorly  posed  problems 


[4],  the  image  division  method  should  be  capable  of  ‘regu¬ 
larization’,  such  that  it  will  tolerate  noise  levels  on  the 
order  of  10%.  Luckily,  there  are  many  candidate  regular¬ 
ization  schemes  to  choose  from  [4,7,8].  We  believe  that 
this  should  be  the  main  thrust  of  future  research  on  the 
problem. 

There  may  be  something  to  gain  in  noise  de-sensitiza¬ 
tion  by  sampling  at  finer  subdivisions  than  as  used  in  the 
demonstrations.  This  would  lead  to  a  situation  of  more 
equations  than  unknowns,  and  the  transition  to  a  least- 
squares  solution.  Such  a  solution  effects  a  certain  degree  of 
regularization  and,  so,  will  be  one  of  the  regularization 
schemes  (referred  to  above)  that  are  experimented  with. 
The  main  problem  that  is  anticipated  with  this  increased- 
data  approach  is  the  need  for  higher-rank  matrix  operations 
(transposition,  multiplication,  inversion),  which  are  heavy 
consumers  of  computation  time.  However,  there  are  ways 
of  getting  around  this  problem,  such  as  by  posing  the 
inversion  problem  as  a  minimization  problem  and  seeking 
a  maximum  gradient  solution  instead.  Such  an  approach 
would  avoid  the  need  for  matrix  inversion,  for  example. 

The  approach  seems  generalizable  to  a  situation  where 
more  than  two  short-exposure  images  are  at  hand.  For 
example,  in  die  case  of  three  images  7(1),  /(2)  and  /<3)  two 
(now)  data  sets  Dj0*/" Vf?1  “d  (see 

Eq.  (10))  could  be  formed,  the  object  spectrum  0„  would 
drop  out  as  before,  and  two  sets  of  linear  equations  (of  the 
form  of  Eq.  (13))  in,  now,  the  three  PSFs  41’,  s^and  s£> 
would  result  Combining  the  two  sets  of  linear  equations 
into  one  should  result  in  a  problem  that  is  soluble  by 
straightforward  linear  inversion.  Given  the  increased 
amount  of  independent  data  in  the  problem,  a  higher 
degree  of  regularization  should  be  attainable.  The  question 
is  whether  this  benefit  would  offset  the  added  price  to  be 
paid  in  added  stability  and  tracking  needs,  as  well  as  image 
manipulation  and  storage  needs. 

This  study  shows  that,  both  by  theory  and  computer 
simulation,  random  atmospheric  turbulence  is  not  the  ulti¬ 
mate  iimitor  of  object  resolution  quality.  It  is  noise  of 
detection,  coupled  with  lack  of  knowledge  of  spread  func¬ 
tion  support. 
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Regularization  of  the  image 

division  approach  to  blind  deconvolution 

Sergio  Barraza-Felix  and  B.  Roy  Frieden 


A  problem  of  blind  deconvolution  arises  when  one  attempts  to  restore  a  short-exposure  image  that  has 
been  degraded  by  random  atmospheric  turbulence.  We  attack  the  problem  by  using  two  short-exposure 
images  as  data  inputs.  The  Fourier  transform  of  each  is  taken,  and  the  two  are  divided.  The  result  is 
the  quotient  of  the  two  unknown  transfer  functions.  The  latter  are  expressed,  by  means  of  the  sampling 
theorem,  as  Fourier  series  in  corresponding  point-spread  functions,  the  unknowns  of  the  problem.  Cross 
multiplying  the  division  equation  gives  an  equation  that  is  linear  in  the  unknowns.  However,  the 
problem  has,  initially,  a  multiplicity  of  solutions.  This  deficiency  is  overcome  by  use  of  the  prior 
knowledge  that  the  object  and  the  point-spread  functions  have  finite  (albeit  unknown)  support  extensions 
and  also  are  positive.  The  result  is  a  fixed-length,  linear  algorithm  that  is  regularized  to  the  presence 
of  4-15%  additive  noise  of  detection.  ©  1999  Optical  Society  of  America 
OCIS  codes:  100.3010,  100.5070,  010.1330. 


1.  Image  Sampling 

For  simplicity  the  algorithm  is  first  developed  for  one¬ 
dimensional  imagery.  Following  this,  the  theory  is 
generalized  to  two  dimensions  to  allow  for  two- 
dimensional  images. 

Consider  an  incoherent  image  i(x)  that  has  a  sharp 
cutoff  frequency  ft.  The  finiteness  of  ft  implies  that 
i{x)  has  infinite  extension.  However,  for  large 
enough  values  of  |x|  it  will  fall  off  to  negligible  values 
compared  with  the  inevitable  noise  of  detection.  For 
analytical  convenience  we  replace  these  values  with 
zeros.  This  is  an  approximation.  Let  this  trun¬ 
cated  image  have  length  L.  For  convenience,  place 
this  image  along  the  positive  x  axis.  Denote  the 
Fourier  transform  of  the  original  image  i(x)  by  70(io). 
The  spatial  frequency  to  is  in  units  of  radians/length. 
Then  70(u>)  —  7(o>),  where 


7(a>).nsw:fin  -  dx  i(x)  exp(-jux).  (1) 

Jo 
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The  image  spectrum  can  be  approximated  by  the 
truncated  image  spectrum. 

By  the  Whittaker-Shannon  sampling  theorem,1 
the  finite  cutoff  frequency  ft  allows  us  to  use  in  place 
of  Eq.  (1),  with  zero  error,  the  discrete  sum 

M-l 

7(w)-ns*sn  =  2  Ax  ifaAx)  exp(-.;'a)mAx).  (2) 

m-0 

Here  M  -  1  =  L/ Ax  and  the  fixed  interval 

Ax  2s  Tr/ft  (3) 

is  called  the  sampling  interval,  because  it  spaces  the 
sampled  intensity  image  as  values  i(mAx). 

Because  of  the  discrete  nature  of  the  sum  in  Eq.  (2) 
the  spectrum  replicates  periodically.  This  permits 
us  to  use,  then,  as  the  spectrum 

/(«*>) os.s2n  =  2  Ax  im  exp(-ywmAx), 

m*0 

i„  =  i(mAx)  (4) 

in  place  of  Eq.  (2). 

The  spectrum  computed  with  Eq.  (4)  can,  in  prin¬ 
ciple,  be  evaluated  at  a  continuum  of  frequency  val¬ 
ues  over  the  given  interval.  For  computational 
purposes  we  evaluate  it  at  a  fine,  but  discrete,  sub¬ 
division 

u»B  -  (2 n/N)n,  n  =  0, 1, . . . ,  N  -  1.  (5) 
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Thase  discrete  values  are  all  within  the  interval  (0, 
2fl$  prescribed  in  Eq.  (4).  The  size  of  N  governs  the 
fineness  of  the  frequency  sampling.  Notice  that  N 
can,  in  fact,  be  taken  as  large  as  is  desired.  We  use 
this  fact  in  the  processing  method  below. 

By  Eq.  (5),  the  exponent  in  Eq.  (4)  becomes 


2  n  mi r 

ji*nmAx=j  —  il  —  = 


2  irjmn 
N 


(6) 


Then  Eq.  (4)  simplifies  to 


(For  the  time  being,  we  ignore  added  noise  of  detec¬ 
tion.)  This  allows  us  to  remove  one  set  of  unknowns 
from  the  problem,  as  follows. 

3.  Image  Division  Method 

For  convenience,  we  use  the  shorthand  notation  that 
means  n  =  0, 1, , . . ,  N  -  1,  and  similarly  for 
/*2).  Recall  that  the  image  spectra  i<1)  and  /12)  are 
known  as  secondary  data  [from  Eq.  (7)]  from  the  pri¬ 
mary  intensity  data  i(1)  and  i(2).  Form  the  quotient 
of  the  secondary  data: 


M-l 

/(u>„)  tm  exp(-2irjmn/N), 

m*  0 

n  =  0,  l,...,iV-l.  (7) 

Equation  (7)  is  in  the  form  of  a  discrete  Founer 
transform  but  with  a  difference:  generally  the  num¬ 
ber  N  of  output  values  in  frequency  space  is  different 
from  the  number  M  of  input  values  in  direct  space. 
Our  division  method  of  turbulence  processing,  devel¬ 
oped  below,  hinges  on  the  use  of  cases  for  which  N>M. 

2.  Image  Turbulence  Problem 

The  problem  of  imaging  through  a  turbulent  atmo¬ 
sphere  has  a  long  history.2-9  The  problem  is  called 
blind  deconvolution  because  not  only  is  the  goal  of 
the  deconvolution  procedure — the  sharp  object — 
unknown  but  so  is  the  point-spread  function  (PSF)  of 
the  imaging  process.  A  good  reference  on  ordinary 
(not  blind)  deconvolution  methods  is  Jansson.10 
Various  approaches  to  solving  the  blind  deconvolu¬ 
tion  problem  have  been  tried;  see,  for  example,  Refs. 
5-9.  These  are  generally  open-ended  iterative 
searches,  growing  out  of  either  a  gradient  search  of 
solution  space  or  a  replacement  algorithm  (successive 
Fourier  transformation  and  replacement).  Also,  the 
approaches  are  nonlinear  in  the  unknowns  (object 
and  PSF)  in  that  these  unknowns  multiply  each  other 
in  the  imaging  equations.  Our  approach  will  in¬ 
stead  be  of  fixed  length  and  linear  in  its  unknowns. 

The  approach  grows  out  of  the  following  imaging 
situation.  A  single  incoherent  object  is  imaged  twice 
in  succession  through  a  turbulent  atmosphere.  The 
images  are  of  short-exposure  duration,  of  the  order  of 
1/60  s  or  less,  but  are  separated  in  time  by  more  than 
one  short-exposure  duration,  say,  3/60  s.  The  im¬ 
ages,  then,  see  two  fixed,  independent  turbulent 
phase  distributions  across  the  optical  pupil. 

The  object  has  an  unknown  spectrum  0(wn)  58  On, 
n  =  0,  1, . . . ,  N  -  1.  Likewise,  the  two  short- 
exposure  images  are  degraded  by  two  unknown,  ran¬ 
dom  optical  transfer  functions  T(1)(u>n)  3  t^1*,  T(2)(con) 
s  t^2);  n  =  0,  1, . . . ,  N  -  1.  This  might  be  called  a 
triply  blind  problem  in  view  of  the  three  sets  of  un¬ 
knowns.  The  two  observed  intensity  images  give 
rise  to  two  known  image  spectra,  I™  and  Jj®,  n  =  0, 
1, . . . ,  N  -  1,  as  computed  from  Eq.  (7).  As  is 
known,1  the  image,  object,  and  transfer  function  spec¬ 
tra  are  connected  by  a  transfer  theorem, 

I®  =  TJpOn,  i  =  1, 2;  n  =  0,l,...,AT-l.  (8) 


Dn 


if  TfO„ 


Tf 

Tf’ 


n  =  0, 1, . . . ,  N  -  1.  (9) 


(Thus,  the  “image  division”  method.)  Equation  (8) 
was  used.  Note  that  the  object  spectrum  On  can¬ 
celed  out  in  Eq.  (9),  which  eliminates  one  set  of  un¬ 
knowns  from  the  problem.  Such  cancellation  hinges 
on  a  lack  of  image  detector  noise  in  Eq.  (8). 

Now,  as  at  Eq.  (7),  r(1)  and  t(2)  relate  to  unknown 
short-exposure  PSFs  s(1)  and  s<2)  by  means  of  the 
sampling  expression 


u- 1 

if  =  Ax  2)  «»  exp(-2i rjmn/N),  i  =  1,  2.  (10) 

m«  0 

Substituting  Eq.  (10)  into  Eq.  (9)  gives  our  working 
expression: 


M- 1  jM- 1 

D„  =  2  sm  exp(-2Ttjmn/N)  2  sf  exp{-2mjmn/N), 

m*0  •  m“0 

71  =  0,1,...  f2V-l.  (ID 

The  left-hand  side  of  Eq.  (11)  is  known  as  data.  The 
PSFs  s(1)  and  s(2>  are  the  unknowns.  Regarded  as  a 
computational  problem,  Eq.  (11)  represents  N  equa¬ 
tions  in  2 M  unknowns.  Recalling  that  N  can  be 
made  as  large  as  we  wish,  can  it  be  made  large 
enough  compared  with  2Af  that  the  equations  give  a 
solution  for  the  PSFs? 

To  see  what  is  really  going  on,  cross  multiply  Eq. 
(11)  and  bring  everything  over  to  one  side,  which 
yields  a  set  of  N  equations 

2  [sf  -  Dn  Sm’]exp(  -  2-njmn/N)  =  0, 

m*0 

n  =  0,l,...,JV-l.  (12) 

These  are,  in  fact,  linear  in  the  unknowns  s(1)  and  s(2) 
and  thus  offer  hope  for  a  closed-form  solution  to  the 
problem. 

Recall  that  the  data  values  are  the  Dn.  These  are 
generally  complex.  Represent  them  as 

Dn  =  Af„  exp(2iy<t>B/N),  (13) 

where  Mn  and  <|>B  are  known  modulus  and  phase 
values.  (The  factor  2it /N  is  there  for  later  conve¬ 
nience.)' 

Equation  (12)  is  complex.  The  real  parts  of  Eq. 
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(12)  give  rise  to  one  set  of  equations;  the  imaginary 
parts,  to  another.  By  Eq.  (13),  these  are 


Re  part:  s”’  cosf2'^1— j  ~^Mn 


,<2) 


X  cos 


2ir 

~N 


(4>„  -  mn ) 


=  0, 


Im  part:  2  sm"  sin|^pj  +  Mns\ 


x  sin 


2tt 

~N 


(<t>„  -  mn) 


=  0.  (14) 


These  are  2 N  linear  equations  in  the  2 M  unknowns 
s(1)  and  s(2). 

4.  Overcoming  the  Homogeneity  Problem 

One  hurdle  that  has  to  be  overcome  is  that  the  pre¬ 
ceding  equations  are  homogeneous  (the  right-hand 
sides  are  0).  We  need  a  right-hand-side  vector  of 
known  values  if  the  system  of  equations  is  to  have  a 
solution  by  conventional  linear  inversion.  Another 
problem  is  one  of  uniqueness  of  solution:  Not  only 
do  the  true  PSFs  s(l)  ana  s(2)  satisfy  the  equations  but 
so  do  the  convolutions  s(1)(x)  ®f(x)  and  s(2)(x)  ®  f(x) 
with  any  kernel  function  f{x).  [By  Eq.  (9),  the  noise- 
free  data  Dn  are  invariant  to  multiplication  of  each  of 
and  by  an  arbitrary  function  Fn  in  frequency 
space.]  The  implication  is  that  the  matrix  of  coeffi¬ 
cients  in  Eqs.  (14)  is  rank  deficient  if  the  data  Mn ,  4>n 
contain  no  noise.  This  point  is  clarified  in  Section  5. 

This  solution-redundancy  problem  has  to  be  over¬ 
come  because  the  plan  is,  by  Eq.  (8),  to  inverse  filter 
each  of  the  images  Z*,1*  and  with  their  estimated 
transfer  functions  tJ*j  and  t^2>,  respectively.  If  each 
of  the  latter  functions  is  incorrect  by  an  arbitrary 
factor  Fn%  this  will  spoil  the  inverse-filtered  outputs. 
In  fact,  this  problem  will  be  overcome  in  the  process¬ 
ing  algorithm  defined  in  Section  6  below. 

The  inhomogeneity  problem  can  be  overcome  as 
follows:  Aside  from  being  invariant  to  the  operation 
of  convolution  (as  mentioned  above),  the  noise-free 
PSF  solutions  are  invariant  to  simple  multiplication 
by  a  constant  [again,  see  Eq.  (9)).  This  is  acceptable, 
because  if  the  PSFs  are  initially  incorrect  by  such  a 
factor  then  subsequent  inverse  filtering  that  uses 
them  will  also  be  merely  incorrect  by  a  constant  fac¬ 
tor.  This  discrepancy  is  acceptable  because  images 
are  adjusted  in  absolute  brightness  for  the  viewer’s 
convenience  anyhow. 

We  take  advantage  of  such  an  arbitrary  multipli¬ 
cative  factor  in  the  following  way:  The  total  ener¬ 
gies  in  s(1)  and  s(2)  are 


*-1,2.  (15) 

m 


We  may  assume  any  convenient  value  for  Do¬ 
ing  so  merely  scales  s(1)  by  an  arbitrary  factor — 


acceptable  by  the  preceding  argument.  Then  Eq. 
(15)  is  solved  for  any  one  sa ’value,  say, 


tf’ «#u-2  (16) 

m*k 

This  sw  value  is  removed  from  the  left-hand  sums  in 
Eqs.  (14),  because  it  is  no  longer  an  unknown.  In¬ 
stead,  it  forms  a  nonzero  right-hand  side.  Accord¬ 
ingly,  Eqs.  (14)  become 


2*2’ 


m*k 


cosi 


(2-n mn\  /2ir kn\ 

~N~ )  ~  C°S\  N  ) 

2  TT 


.(2) 


x  cos  —  (<J>n  -  mn)  =  -\Ea)  cos 
N 


2  irkn\ 

~N~I’ 


sn) 


m  *k 


+  2>ms; 


.  (2 Ttmn\  .  (2i ikn\ 

smnn  ■  smpr] 

2it 

x  sin  —  (<J>n  -  mn)  =  -E(l)  sin 
N 


(2) 


(2v kn\ 

hr)-  <l7) 


Index  n  —  0, 1, . . . ,  N  -  1  in  the  first  of  Eqs.  (17)  and 
n  =  1,  2, . . . ,  N  -  1  in  the  second  (for  n  -  0  it 
becomes  the  tautology  0  =  0  and,  so,  is  skipped). 
The  equations  are  no  longer  homogeneous,  as  re¬ 
quired.  They  are  also  no  longer  rank  deficient  for  M 
small  enough  (see  Sections  5  and  7). 


5.  Least-Squares  Solution 

Equations  (17)  are  linear  in  the  unknowns  and 
s®.  They  take  the  convenient  matrix  form 


[#>  =  b,  (18) 

where  x  designates  the  vector  of  unknowns  and 
s™  [if]  denotes  the  matrix  of  trigonometric  coeffi¬ 
cients  in  Eqs.  (17),  and  b  denotes  the  vector  right- 
hand  sides  of  Eqs.  (17). 

Recall  that  the  number  N  of  data  frequencies  can 
be  made  arbitrarily  large.  With  a  fixed  number 
2M  -  1  of  unknowns,  the  problem  [Eq.  (18)]  becomes 
overdetermined,  which  allows  a  least-squares  solu¬ 
tion  to  be  sought.  A  least-squares  solution  is  bene¬ 
ficial  because  it  effects  a  degree  of  data  noise 
smoothing,  i.e.,  regularization,  on  its  own.  How¬ 
ever,  we  also  noted  that  the  matrix  of  coefficients  in 
Eqs.  (14)  is  rank  deficient.  This  deficiency  can  be 
overcome  by  the  proper  choice  of  dimension  M  in  Eqs. 
(17),  as  described  next. 

It  was  previously  found11  that  Eqs.  (18)  can  be 
solved  if  the  user  has,  effectively,  prior  knowledge  of 
support  regions  for  the  PSFs.  Then  a  least-squares 
solution  to  the  problem  of  Eq.  (18)  can  be  formed.  As 
much  as  approximately  2%  additive  Gaussian  noise 
can  be  so  tolerated.11  However,  this  tolerance  is  too 
narrow  for  practical  problems.  Something  else 
needs  to  be  inserted  into  the  algorithm,  namely,  the 
use  of  prior  knowledge10  about  the  PSFs,  to  regular¬ 
ize  the  algorithm  to  the  presence  of  greater  amounts 
of  noise. 

Generally  speaking,  the  more  prior  knowledge 
there  is  about  the  unknowns  of  a  problem,  the  better 
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is  the  degree  of  regularization  that  can  be  attained. 10 
Two  forms  of  prior  knowledge  are  at  hand  here.8 
One  is  finite  support  regions  for  the  PSPs,  as  dis¬ 
cussed  above.  Another  is  positivity.  Because  our 
images  are  incoherent,  the  object  and  the  PSF s  rep¬ 
resent  energy  distributions  and,  hence,  must  obey  a 
condition  of  positivity: 

.  om>0,  sL2,sO.  (19) 

We  return  to  the  knowledge  of  the  finiteness  of 
support.  Both  the  object  and  the  PSFs  are,  effec¬ 
tively,  zero  outside  regions  of  finite  extension.  Also, 
consider  only  a  case  in  which  these  extensions  are 
small  enough  that  the  image  field  contains  all  the 
image  energy:  None  of  it  spills  outside  the  field. 

Generally  denote  the  true  value  of  a  support  exten¬ 
sion  by  K  and  an  estimate  by  K.  The  object  is  as¬ 
sumed,  for  simplicity,  to  lie  within  a  square  field  of 
linear  extension  Kob .  The  two  PSFs  have,  in  gen¬ 
eral,  different  ^-component  supports  and  different 
y-component  supports.  However,  by  a  trick  [see 
Eqs.  (27)  below]  we  can  make  them  both  have  effec¬ 
tively  the  same  x-component  support,  denoted  Kx, 
and  the  same  y -component  support,  denoted  Ky. 

This  knowledge  of  prior  support  mathematically 
affects  the  rank  of  problem  (18),  as  follows:  Suppose 
that  both  PSFs  s £  and  $%]  have  a  true  common 
support  value  K.  Also,  let  the  estimated  common 
support  value  be  K.  Then  the  upper  value  for  m  in 
the  sums  in  Eqs.  (17)  will  now  be  K  instead  of  the 
image  support  value  M .  This  means  that,  effec¬ 
tively,  many  of  the  PSF  values  are  equated  to  zero 
and,  consequently,  the  matrix  [H]  becomes  narrower. 

Matrix  [H]  is  full  rank  or  rank  deficient,  depending 
on  the  presumed  support  size  It,  as  follows:  First 
consider  a  case  of  noiseless  data.  If  the  guess  is  that 
£  =  Af,  then  any  PSF  answers  are  permitted  that  are 
wider  than  true  support  value  K  through  convolu¬ 
tions  with  any  kernel  function  f(x)  (see  the  beginning 
of  Section  4)  of  support  length  (Af  -  K).  Matrix  [H] 
mirrors  this  ambiguous  situation  by  not  permitting  a 
unique  solution,  i.e.,  by  being  rank  deficient.  Like¬ 
wise,  if  is  made  any  value  larger  than  the  true 
support  value  Kt  the  matrix  will  still  be  rank  defi¬ 
cient.  [Now  the  kernel  function  need  have  only  sup¬ 
port  length  (K  —  X).]  However,  once  it  :£  K  the 
preceding  convolution  effect  can  no  longer  hold  (the 
kernel  would  necessarily  have  negative  support). 
These  arguments  were  borne  out  in  computer  simu¬ 
lations  as  described  in  Section  7. 

On  the  other  hand,  with  data  noise  present  there  is 
no  problem  of  rank  deficiency  for  [JS].  Rank  defi¬ 
ciency  followed  ultimately  from  the  fact  that  the 
noise-firee  data  Dn  defined  by  Eq.  (9)  are  invariant  to 
multiplication  of  both  transfer  functions  t|11)  and  tJ,2) 
by  an  arbitrary  function.  However,  with  data  noise 
present  in  the  image,  the  division  of  their  spectra  7^1) 
and  7^2)  in  Eq.  (9)  no  longer  results  in  a  simple  quo¬ 
tient  of  transfer  functions  t^1)/t^2).  Hence  there  is  no 
longer  a  possible  ambiguity  in  the  data  owing  to  mul¬ 
tiplication  of  the  transfer  functions  by  an  arbitrary 


function.  In  essence,  the  noise  breaks  the  ambigu¬ 
ity.  Correspondingly,  matrix  [H]  should  be  full  rank 
for  any  choice  of  support  K  as  long  as  the  latter  does 
not  exceed  the  image  support  size  M.  Again,  this 
behavior  was  followed  in  the  computer  simulations. 

With  presumed  support  values  Kx  and  KyJ  problem 
(18)  becomes 

=  b*.  (20) 

Equation  (20)  can  be  solved  in  the  least-squares 
sense, 

\\[Hk]xk  -  bill  =  min.,  (21) 

as  follows: 

We  used  one-dimensional  notation  above.  Now 
we  proceed  to  the  full,  two-dimensional  case.  It  is 
easy  to  see  that  equations  of  the  same  form  as  Eq.  (20) 
result,  provided  that  the  now  doubly  subscripted  un¬ 
knowns  and  s(^n  are  packed  as  a  new,  one¬ 
dimensional  vector  x%  of  length  M  =  KJty.  The 
number  of  unknowns  is.  as  in  one  dimension,  (2 M  —  1). 
Also,  N  is  now  the  chosen  number  of  two-dimensional 
frequencies  (u>m,  a>n).  As  in  one  dimension  there  are 
2 N  —  1  equations.  However,  ( N  —  1)  of  these  equa¬ 
tions  are  found  to  be  repetitions  and  so  are  ignored. 
This  leaves,  then,  N  equations  in  the  (2 M  —  1)  un¬ 
knowns.  However,  if  the  support  lengths  Kx  and  Ky 
are  chosen  small  enough  relative  to  N ,  the  net  matrix 
[Hr]  will  be  tall  and  full  rank,  so  a  least-squares 
solution  to  Eq.  (21)  can  be  sought. 

Examples  of  least-squares  solutions  are  provided 
by  the  simulations  below.  There  the  image  spectral 
field  is  of  dimension  N  =  24  x  24  =  576  frequencies. 
Thus  there  are  576  equations  in  problem  (20).  Also, 
the  spatial  image  field  is  made  to  be  16  x  16  pixels. 
Hence  any  PSF  with  a  support  length  of  16  or  more 
will  cause  aliasing  effects.  The  true  PSF  supports 
are  made  to  be  Kx  =  Kv  =  8  pixels.  The  status  of  the 
matrix  [Hr]  with  respect  to  rank  sufficiency  is  found 
to  be  as  follows:  When  no  noise  is  added  to  the  im¬ 
age  data  the  matrix  is  full  rank  when  Kx  ^  8  and  Ky 
<  8.  With  added  noise,  the  matrix  is  full  rank  when 
both  Kx  ^  15  and  Ky  ^  15.  The  noise  breaks  the 
ambiguity  problem,  as  was  mentioned  above. 

6.  Net  Algorithm 

The  preceding  considerations  provide  the  basis  for 
a  fixed-length  restoration  approach.  The  approach 
would  be  simply  the  preceding  least-squares  method 
if  the  correct  supports  Kx  and  Ky  were  known.  How¬ 
ever,  of  course  they  are  not.  Therefore  the  plan  is  to 
form  a  series  of  least-square  solutions  Xlsq  to  Eq. 
(21),  using  a  fixed  sequence  of  trial  values  Kx  and  Ky, 
and  then  in  some  way  to  judge  which  solution  is  the 
best  of  the  lot.  The  best  solution  is  defined  to  be  the 
one  that  gives  a  pair  of  image  estimates  that  best 
agree  with  the  two  given  images,  subject  to  enforced 
positivity  [inequalities  (19)]  on  the  running  object 
and  PSF  estimates.  The  algorithm  is  given  next 
and,  for  the  reader's  convenience,  is  also  shown  sche¬ 
matically  in  Fig.  1: 
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Convolve 


Fig.  1.  Flow  chart  for  the  image  division  program. 

(1)  A  pair  of  short-exposure  images  i(1)  and  i(2)  is 
given.  These  are  the  primary'  data  for  the  problem. 
From  these  images,  spectra  and  T2)  are  formed  by 
a  discrete  Fourier  transform,  Eq.  (7).  These  spectra 
are  divided,  as  in  Eq.  (9),  to  form  quantities  Dn. 
From  Eq.  (13),  the  associated  modulus  and  phase 
quantities  Mn,  and  <t>„  are  formed.  The  latter  may 
be  regarded  as  secondary  (i.e.,  preprocessed)  data  for 
the  problem. 

(2)  A  pair  of  support  estimates  ttx  and  tty  is  chosen. 
These  supports  are  both,  of  course,  smaller  than  the 
total  image  field  extension,  so  this  choice  reduces  the 
number  of  unknowns  x  in  problem  (21),  as  discussed 
above.  The  support  choice  strongly  affects  the  solu¬ 
tion. 

(3)  The  least-squares  solution  x^sq  ■  §w,  i(2)  to 
problem  (21)  is  formed  by  use  of  orthogonal- 
triangular  factorization.12  This  factorization  avoids 
the  need  to  take  matrix  inverses,  a  numerically  un¬ 
stable  operation  for  poorly  conditioned  matrices. 
The  PSF  outputs  are  corrected  to  obey  positivity  (19). 
The  rule  is  to  replace  all  negative  values  by  0.  Call 
the  outputs  s™  and  s(2). 

(4)  The  discrete  Fourier  transforms  [Eq.  (7)]  of 
these  outputs,  t(1)  and  t(2),  are  taken  and  are  used  in 
Eq.  (8)  to  form  object  estimates: 

CP  =  P/P,  i  -  1, 2  (22) 

by  inverse  filtering. 


(5)  These  estimates  are  inverse-Fourier  trans¬ 
formed  into  (x,  y )  space  to  yield  object  estimates  6<‘), 
i  =  1,  2. 

(6)  These  object  estimates,  in  general,  will  not  obey 
positivity  (19).  We  enforce  positivity,  as  in  step  (3), 
i.e.,  by  simply  zeroing  every  negative  value. 

(7)  The  object  should  have  a  support  that  is  consis¬ 
tent  with  that  of  the  image  and  that  of  the  PSFs. 
Define  (for  example)  the  x-component  support  Kirnr  in 
the  image  as  its  linear  extension  at  the  2%  level  of 
maximum  intensity.  Then,  with  the  trial  value  ftx 
serving  as  the  x-component  support  of  the  PSFs,  the 
x-component  support  value  ttohx  of  the  object  is  taken 
to  obey 

m  Kimx  -  ttz  +  1.  (23) 

The  estimated  object  is  zeroed  outside  this  support 
interval.  The  same  operations  are  performed  for  the 
y-component  direction. 

The  outputs  of  steps  (6)  and  (7)  are  called  6+’, 
i  =  1,  2. 

(8)  Each  estimated  PSF  from  step  (3)  is  convolved 
with  a  corresponding  object  estimate  from  step  (7), 
forming  estimated  images 

2?  -  s':1  0  6(i’,  i  =  1, 2.  (24) 

The  symbol  ®  denotes  a  convolution  operation. 

(9)  The  extent  to  which  these  estimated  images 
agree  with  the  given  images  i(1)  and  i(2)  defines  how 
valid  were  the  hypothetical  support  values  in  step  (2). 
For  example,  poor  agreement  is  taken  to  imply  in¬ 
valid  support  values.  We  quantify  poor  agreement 
by  forming  an  error  metric  over  both  images: 

e  e  eU)  +  e(2),  e«>  -  (25) 

The  vertical  lines  signify  the  absolute  value  operation 
on  the  vector  within.  Parameter  c  measures  the  in¬ 
consistency  of  the  solution  with  the  image  data. 

(10)  The  value  of  e  is  registered,  and  a  new  cycle 
begins  at  step  (2),  with  a  new  choice  of  supports. 

(11)  The  minimum  value  of  e  that  is  found  over  the 
range  of  test  supports  is  taken  to  identify  the  final 
solution  l+\  o+\  i  =  1,  2.  The  arithmetic  average  of 
the  two  object  estimates  defines  the  output  object: 

6  =  +  6®).  (26) 

7.  Demonstrations 

The  effectiveness  of  the  image  division  algorithm 
[steps  (1) — (11)]  is  tested  by  computer  simulation. 
The  computer  is  an  Acer  Model  2016  Emerald  Desk¬ 
top  Computer.  It  has  a  166-MHz  Pentium  processor 
and  a  16-MB  EDO  memory.  Execution  time  is  ~1 
min  for  each  cycle  [steps  (1) — (11)]  of  the  algorithm. 
For  the  16  x  16  pixel  field  used,  a  typical  output 
requires  120  cycles,  or  120  min  in  all. 
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*  Hie  image  spectral  field  size  in  all  cases  is  24  x  24 
’  frequencies,  or  N  =  576.  (Larger  field  sizes  might 
require  the  use  of  a  mainframe  computer.)  The  sub¬ 
division  in  image  space  is  16  x  16  pixels,  and  we 
confine  the  object  to  an  8  x  8  pixel  square  region  in 
the  center  of  the  field.  The  two  PSFs  are  con¬ 
structed  with  support  values  Kx  =  Ky  -  8. 

In  the  solution  search  loop  we  allow  for  all  possible 
support  pairs  (Rx,  RJ  over  the  full  range  of  values  2  £ 
Rx  £  15  and  2  s  Ky  s  15  when  the  simulated  data 
contain  noise,  or  values  Rx  s  8  and  Ry  £  8  when  there 
is  no  noise.  For  these  ranges  of  support  values,  ma¬ 
trix  [Hr\  is  found  to  be  of  full  rank  (as  discussed  in 
Section  5).  In  actual  practice,  when  it  is  not  known 
whether  the  image  data  contain  significant  noise  the 
full  range  of  supports  would  be  used,  of  course.  A 
state  of  rank  deficiency  is  made  known  to  the  user  by 
means  of  a  flag  indicated  by  the  matlab  system  in  use, 
so  this  trial  support  combination  is  simply  skipped  in 
the  search  procedure. 

Additive  noise  of  detection  is  included  in  the  image 
simulations.  Hence  there  are  two  sources  of  ran¬ 
domness  to  overcome:  the  randomness  of  the  PSFs 
and  of  the  detector  noise.  The  latter  is  independent 
Gaussian,  with  a  standard  deviation  <j  that  is  ex¬ 
pressed  as  a  percent  of  the  root-mean-square  signal 
level.  For  example,  4%  noise  means  that  ct  is  4%  of 
the  root-mean-square  image  level. 

In  the  first  tests,  a  letter  C  object  is  used  as  the 
ideal  input;  see  Fig.  2(a).  Note  the  gradual  bright¬ 
ening  from  top  to  bottom  and  the  sharp  edges  bound¬ 
ing  the  letter.  These  features  allow  the  algorithm  to 
be  tested  against  both  subtly  changing  and  rapidly 
changing  gray  levels. 

The  two  PSFs  for  the  4%  noise  case  are  shown  in 
Figs.  3(a)  and  3(b).  These  PSFs  were  constructed 
independently  at  each  pixel  within  a  central  8x8 
field.  Hence  the  true  PSF  support  values  (Kx,  Ky) 
are  (8,  8).  Each  pixel  intensity  is  randomly  chosen 
from  a  uniform  probability  law,  with  thresholding  at 
a  finite  lower  value  to  make  the  black  background 
intrude  randomly  within  the  support  boundary. 
The  aim  is  to  make  the  boundary  less  well-defined  as 
a  square  (8  x  8)  entity,  so  the  support  information  (8, 
8)  does  not  represent  an  unrealistically  strong 
amount  of  information.  The  PSFs  shown  are  typi¬ 
cal  of  the  ones  tested.  Since  the  PSFs  were  gener¬ 
ated  independently  at  each  pixel,  their  power  spectra 
are  flat.  Real  turbulence  is,  of  course,  approximated 
by  a  Kolmogorov  power  spectrum.2  We  judged  that 
a  flat  power  spectrum  would  be  a  harsher  test  of  the 
approach  than  the  Kolmogorov  spectrum  because  un¬ 
correlated  PSF  signals  represent  a  situation  of  max¬ 
imum  ignorance  regarding  PSF  values.  Also, 
uncorrelated  PSFs  were  easier  to  generate  by  com¬ 
puter.  Future  tests  of  the  algorithm  will  be  made 
with  real  (turbulence-degraded)  data,  which  is  of 
course  the  truest  test  of  any  algorithm. 

Reconstructions  of  the  letter  C  by  the  use  of  the 
algorithm  are  given  in  Figs.  2(b)-2(d).  These  recon¬ 
structions  are  for  various  levels  of  image  noise,  as 
indicated.  It  is  seen  that  the  edge  gradients  are  well 
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Fig.  2.  Object  reconstructions  at  given  levels  of  detector  noise  (the 
indicated  coordinates  are  pixels):  (a)  letter  C  object,  (b)  recon¬ 
struction  with  0%  detection  noise,  (c)  reconstruction  with  2 %  noise, 
(d)  reconstruction  with  4%  noise. 


restored  at  these  noise  levels,  although  the  subtle 
gray-level  transition  is  poorly  reconstructed  at  the 
4%  level  of  noise. 

The  reconstructed  PSFs  are  shown,  respectively, 
in  Figs.  3(c)  and  3(d).  It  is  seen  that  at  this  level  of 
noise  there  is  a  tendency,  although  not  a  strong  one, 
for  the  pixel  values  that  are  internal  to  the  PSFs  to 
be  faithfully  reconstructed.  On  the  other  hand,  the 
supports  for  the  PSFs  in  Figs.  3(a)  and  3(b)  are  per¬ 
fectly  estimated  in  the  reconstructions  in  Figs.  3(c) 
and  3(d).  Probably  the  fidelity  of  the  reconstruc¬ 
tions  owes  more  to  the  correct  estimation  of  the  sup- 


Fig.  3.  Data  PSFs  and  their  reconstructions  for  4%  noise:  (a) 
PSF  1,  (b)  PSF  2,  (c)  reconstruction  of  PSF  1,  (d)  reconstruction  of 
PSF  2. 
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Fig.  4.  Data  images  for  4%  noise:  (a)  image  1,  (b)  image  2. 


5  10  15  5  10  15 

Fig.  5.  Impulse  object  case  studies:  (a)  object,  (b)  reconstruction 
with  5%  detection  noise,  (c)  reconstruction  with  10%  noise,  (d) 
reconstruction  with  15%  noise. 


ports  than  to  reproduction  of  the  internal  features  of 
the  PSFs.  Again,  we  tried  to  minimize  this  effect  by 
allowing  the  zero  background  to  intrude  randomly 
within  the  8x8  fields  of  the  simulated  PSFs. 

The  two  images  that  were  used  as  inputs  in  the  4% 
noise  case  are  shown  in  Fig.  4.  These  are,  mathe¬ 
matically,  the  convolutions  of  the  letter  C  of  Fig.  2(a) 
with  the  PSFs  of  Figs.  3(a)  and  3(b),  plus  4%  noise. 
The  images  exhibit  little  resemblance  to  the  letter  C 
and  so  constitute  a  good  test  case  for  the  algorithm. 

It  is  by  now  well  known  that  pointlike  objects  are 
restored  exceptionally  well  when  they  are  con¬ 
strained  to  obey  positivity.10*13  We  therefore  also 
tested  the  algorithm  against  a  cluster  of  point-source 
objects,  which  are  shown  in  Fig.  5(a).  The  image- 
division  algorithm  was  applied  to  image  pairs  of  this 
object  at  three  noise  levels:  5%,  10%,  and  15%. 
The  object  reconstructions  are  shown,  respectively, 
for  these  noise  cases  in  Figs.  5(b),  5(c),  and  5(d).  In¬ 
deed,  these  figures  show  the  three  point  sources  quite 
well,  i.e.,  with  almost  no  blur.  As  a  negative  aspect, 
spurious  background  details  emerge.  These  would 
probably  be  minimized  by  the  use  of  a  nonlinear  re¬ 
storing  technique. 

A  premise  of  the  algorithm  [see  step  (11)]  is  that 
the  data  inconsistency  e  has  a  grand  minimum  at  the 
correct  support  levels  ( K „  Ky)  for  the  case.  To  check 
out  this  assumption  we  have  plotted,  in  Fig.  6,  e 
versus  K*  for  various  values  of  Ky  for  4%  noise. 
From  Figs.  3(a)  and  3(b)  the  true  support  values  are 
here  (i^,  Ky)  =  (8,  8).  It  is  seen  that  the  minimum 
value  is  at  the  low  point  of  the  open-circle  curve,  i.e., 
for  the  point  ()£*,  Ky)  =  (8,  8).  These  are  the  correct 
supports  for  this  problem.  Notice,  however,  that  the 
asterisk  curve  (for  Ky  -  7)  has  a  minimum  that  is 
nearly  as  low  as  that  of  the  open-circle  curve.  This 


problem  worsens  at  higher-noise  cases,  for  which  the 
minimum  may  no  longer  define  the  true  supports. 

8.  Discussion 

The  two  PSFs  might  not  have  the  same  support  Kx 
and  the  same  support  Ky.  In  this  general  case  there 
are  four  support  parameters  to  vary  in  the  algorithm 
loop  (1H11).  This  case  represents  a  four- 
dimensional  space  of  solutions,  computationally  a 
much  more  difficult  problem  to  cope  with  than  the 
two-dimensional  one  that  we  used.  In  fact,  we  can 
convert  the  four-dimensional  problem  into  a  two- 
dimensional  one,  as  follows.  Choose  weights  ox,  a2, 
bv  b2>  0  such  that  new  images  are  formed: 

-  a1i(1)  +  a2;(2\ 

?2)  »  bxi{1)  +  b2ii2\  ax  +  a2  =  1,  bx  +  b2  =  1. 

(27) 


Fig.  6.  Does  data  inconsistency  e  attain  a  grand  minimum  at  the 
correct  support  levels?  Inconsistency  e  is  plotted  versus  support 
component  A,  for  various  values  of  component  Kyi  Ky  =*  9  (plus¬ 
es),  Ky  *  8  (open  circles),  Ky  **  7  (asterisks).  The  correct  support 
levels  (K^rKy)  =  (8,  8)  are  indeed  attained  at  the  grand  minimum 
(see  the  open-circle  curve). 
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%The  new  images  f(1)  and  f*2)  are  used  as  the  inputs  to 
*  the  image  division  algorithm.  It  is  apparent  from 
Eqs.  (27)  that  the  new  images  are  formed  by  net 
PSFs  that  obey,  respectively, 

s<2>  =  61s(1)  +  b2s{2\  (28) 

Because  each  of  the  new  PSFs  is  a  weighted  sum  of 
both  old  PSFs,  the  new  PSFs  must  both  have  the 
same  2-component  support  and  the  same 
y-component  support,  as  we  wanted. 

The  overall  objective  of  regularizing  the  image  di¬ 
vision  method  has  been  partially  achieved.  Depend¬ 
ing  on  the  type  of  object  present,  anywhere  from  4% 
to  15%  noise  can  be  tolerated  by  the  amended  ap¬ 
proach.  The  advantages  of  the  approach  are  the  fi¬ 
delity  of  its  outputs,  its  linearity  and  hence  potential 
speed,  that  it  is  of  fixed  length  and  avoids  an  open- 
ended  search  of  solution  space,  and  that  it  needs  but 
two  short-exposure  images  as  inputs.  The  central 
role  played  by  the  existence  of  finite  supports  for  the 
PSFs  and  the  object  has  become  apparent.  It  re¬ 
sults  that  any  prior  knowledge  of  PSF  support  that 
can  be  built  into  the  algorithm  to  narrow  its  support 
search  will  increase  its  utility.  Also,  the  method  by 
which  positivity  is  enforced  can  be  improved.  The 
zero-replacement  approach  of  steps  (3)  and  (6)  of  the 
algorithm  does  not  permit  data  consistency  in  the 
estimates.  Recourse  to  a  nonlinear  approach  such 
as  maximum  entropy10*13  permits  data  consistency. 
This  should  lead,  ultimately,  to  a  better  estimate  of 
the  PSF  supports  and,  hence,  to  a  better  output  re¬ 
construction  6. 

It  has  been  noted  that  a  practical  technique  for 
dealing  with  the  levels  of  noise  encountered  in  real 
data  has  yet  to  be  found.9  The  image  division  algo¬ 
rithm  might  achieve  this  overall  aim.  The  next  tests 
of  the  algorithm  will  be  on  real  atmospherically  de¬ 
graded  images. 


This  study  was  supported  by  a  grant  from  the  Un¬ 
conventional  Imagery  program  of  the  U.S.  Air  Force 
Office  of  Scientific  Research. 
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A  new  information  matrix  [F]  with  elements  Fm  =  <(ym  —  om)(yH  —  am) 
( 8  In  p(y  |  tt)/dam)(d  In  p(y  |  a)/9a„)>  is  analyzed.  The  PDF  p(y  |  a)  is  the  usual 
likelihood  law .  [F]  differs  from  the  Fisher  information  matrix  by  the  presence  of 
the  first  two  factors  in  the  given  expectation.  These  factors  make  F^  unitless,  in 
contrast  with  the  Fisher  information.  This  lack  of  units  allows  F^  values  from 
entirely  different  phenomena  to  be  compared  as,  for  example,  Shannon  informa¬ 
tion  values  can  be  compared.  Each  element  Fm  defines  an  error  inequality 
analogous  to  the  Cramer-Rao  inequality.  In  the  scalar  case  Fm  =  F,  for  a  normal 
p(y\a)  law  F—3,  while  for  an  exponential  law  F=9.  A  variational  principle 
F=  min  ( called  FM1N )  allows  an  unknown  PDF  p(x)  to  be  estimated  in  the 
presence  of  weak  information.  Under  certain  conditions  F  obeys  a  Boltzmann 
F-theorem"  dF/dt  <  0,  indicating  that  F  is  a  physical  entropy.  Finally,  the  trace  & 
of  [F]  may  be  used  as  the  scalar  information  quantity  in  an  information-based 
principle  for  deriving  distribution  laws  p  of  physics. 


1.  INTRODUCTION 

The  concept  of  Fisher  information* is  a  renowned  and  invaluable  tool  of 
estimation  theory.  However,  Fisher  information  has  a  number  of  properties 
which  are  not  ideal  for  various  purposes.  For  example,  in  describing  a 
parameter  a  with  units  (say)  of  length,  Fisher  information  has  units  of 
1/length2.  Or,  for  a  parameter  with  units  of  mass,  the  information  has  units 
of  1/mass2.  Then  a  comparison  of  the  two  information  values  is  somewhat 
like  comparing  apples  with  oranges.  If,  on  the  other  hand,  the  information 
was  unitless,  such  a  comparison  could  be  effectively  made  (as,  for  example, 


1  Optical  Sciences,  University  of  Arizona,  Tucson,  Arizona,  85721;  e-mail:  friedenr@super. 
arizona.edu. 
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when  comparing  Shannon  information  values  [Eq.  (48)  below]  from  dif¬ 
ferent  phenomena,  all  of  which  are  unitless). 

The  subject  of  this  paper  is,  then,  the  development  of  an  information 
quantity  that  is  related  to  the  Fisher  information  but,  by  contrast,  has  the 
attribute  of  being  units-free.  The  new  information  will  have  other  desirable 
properties  as  well. 

We  start  out,  in  Section  1.1,  with  an  enumeration  of  the  basic  proper¬ 
ties  obeyed  by  Fisher  information,  including  sensitivity  to  units.  Then  in 
Section  1.2  we  show  that  Fisher  information  does  not  remain  invariant  in 
form  to  a  transformation  to  Fourier  space.  Motivated  by  these  nonideal 
properties  of  Fisher  information,  we  introduce  and  develop  in  Section  2 
and  beyond  the  new,  unitless  information  measure. 


1.1.  Basic  Measurement  Theory 

Suppose  that  we  want  to  estimate  a  set  of  parameters  a  =  al,...,ak 
from  data  y  =  y !,...,  }>n-  The  data  have  a  random  component  that  is  usually 
called  “noise.”  The  estimates  are  chosen  functions  a  =  a^y),...,  ak(y)  of  the 
data  called  “estimator  functions.”  A  central  issue  is  how  accurate  these 
estimates  can  be,  in  the  presence  of  given  noise  (as  specified  by  a  PDF 
p(y  |  a)  called  the  “likelihood  law”).  If  the  estimators  are  correct  on  average 
(“unbiased”)  over  all  possible  data  y,  obeying 

<a*(y  )>=ak  (1) 

then  the  mean-square  errors  e*==  <(a*  —  d*(y))2)  of  estimation  obey 

(2) 

This  is  the  Cramer-Rao  inequality.*1*  Quantity  [Z-1]**  is  the  fcth  element 
along  the  diagonal  of  the  inverse  matrix  [/-1]  to  the  Fisher  information 
matrix  [I],  where 


m  =  {'-}.  p-pb '*>  (3) 

Notation  I^p)  means  that  1^  is  a  functional  of  p. 

Equation  (2)  shows  that  the  elements  of  [/]  must  have  units  that  are 
the  reciprocal  of  those  of  the  square  of  any  parameter  ak.  This  has  the 
undesirable  effect  of  making  it  difficult  to  compare  in  some  meaningful  way 
the  Fisher  information  values  for  two  sets  of  data  arising  from  different 
phenomena.  By  comparison,  Shannon  information,  which  is  unitless,  does 
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not  present  this  problem.  One  bit  of  information  has  the  same  meaning 
regardless  of  phenomenon. 

In  physical  applications  mentioned  below,  we  will  be  most  interested 
in  the  case  where  there  is  one  measurement  per  parameter,  and  the  noise 
values  x„  are  independent.  Then  dimension  K=  N  and 

(4) 

(5) 

(6) 

(All  integrals  have  infinite  limits.)  The  tie-in  with  general  matrix  [7-1] 
preceding  is  that  element 


yn  —  ”1"  <2„(y)  ^  1>»*»>  at 

The  Cramer-Rao  relationship  (2),  (3)  now  becomes  more  simply 
e2J„>  1 

din  pn(y„\anr* 


/  dyn  Pn{yn\an ) 


da _ 


[/-']—=»  H.  (7) 

Further,  by  the  form  of  the  first  Eq.  (4)  the  fluctuations  in  yn  purely 
follow  those  in  x„,  so  that 

Pn{yn  I  t*n)  =  PxS y„  -  an  |  a  J  (8)  . 

where  the  latter  is  the  PDF  for  xn.  Assume  also  that  p„  obeys  Galilean  (or, 
shift)  invariance, 


PxK{y*  -  I  an )  =  pX'(yn  -  a„) 


(9) 


Using  this  in  Eq.  (6)  and  changing  integration  variable  to  x  —  yn  —  an  gives 


In  =  In{Px.)  =  \  dx 


{dpxSx)/dxf_ 

Pxn{x) 


(10) 


With  n=  1,...,  N,  this  defines  a  vector  of  information  quantities. 

An  associated  scalar  information  I  may  be  formed  as  the  sum  over  all 
information  components  In.  This  is  also  the  trace  of  the  Fisher  information 
matrix  [/].  Then,  by  Eq.  (10), 


Vp&Vdxr  "  r  ■ 

Px.{*)  V  dx  ) 


=  q%),  where  pXg(x)  =  gn(x)2 


(11) 


(12) 
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defines  real  probability  amplitude  functions  q„{x).  We  see  in  Eq.  ( 1 1 )  that 
I  no  longer  depends  upon  the  parameter  an.  This  is  beneficial  since  a„  was 
unknown,  by  hypothesis. 


12.  Noninvariance  of  Form  Under  Fourier  Transformation 

Information  form  I  of  Eq.  ( 1 1 )  has  been  used  as  the  key  element  in  an 
approach  for  deriving  distribution  laws  px <  or  amplitudes  q„  of  physics.  (2'7) 
The  approach  is  called  the  principle  of  extreme  physical  information  (EPI). 
The  trace  information  form  I  [first  suggested  by  Stam(8)]  defines  the  quality 
of  data  values  in  (initially)  x-space.  However,  in  quantum  mechanics 
x-space  and  momentum  p  space  are  equally  valid  specifiers  of  a  particle, 
and  nature  is  indifferent  to  the  space  chosen  for  the  measurement.  Hence, 
the  information  that  is  used  to  specify  the  quality  of  the  measurement 
should  be  symmetric  to  the  choice  of  space.  That  is,  the  information  should 
have  the  same  form  in  each  space.  However,  as  we  show  next,  information 
I  does  not  have  this  property. 

Consider  the  case  where  the  number  N  of  measurements  is  2.  For  con¬ 
venience,  pack  the  amplitude  functions  (q:,  q2)  as  the  real  and  imaginary 
parts  of  a  complex  probability  amplitude 

\l/  =  qx  +  iq2,  i  —  y/—\  (13) 

Assume  that  the  unknown  PDF  obeys  Galilean  property  (9),  so  that  I  has 
the  form  Eq.  ( 1 1 ).  By  the  use  of  Eq.  ( 13),  it  is  simple  to  show  that  Eq.  (11) 
may  be  expressed  in  terms  of  the  complex  amplitude  function  \{/(x)  as 

7  =  4  j  dx  |^'(x)|2,  \f/'(x)  =  d\j//dx  (14) 

A  complementary  space  to  the  space  x  is  the  Fourier  space  p,  which 
is  specified  by  an  amplitude  function  <f>(p )  obeying*3, 4,7) 


For  a  particle,  Fourier  space  has  the  physical  significance  of  being  momen¬ 
tum  space,  with  p  the  momentum.  When  one  substitutes  Eq.  (15)  into 
Eq.  (14),  the  result  is 


/=^  \  dM  M2  \<Kl*)\2 


(16) 
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This  is  not  of  the  same  form,  in  its  dependence  upon  coordinate  p,  as  is 
Eq.  (14)  in  its  dependence  upon  coordinate  x. 

Hence,  to  attain  the  required  invariance  in  form  requires  a  different 

information  measure. 


2.  A  UNITLESS  INFORMATION  MEASURE 

For  simplicity,  we  first  seek  a  unitless,  scalar  measure  of  information  F. 
This  can  be  independently  derived  in  three  different  ways,  from  different 
aspects  of  the  measurement  process:  (1)  the  use  of  unbiased  data,  (2) 
Fisher  information  1  for  logarithmic  noise,  and  (3)  Fisher  information  /  for 
a  net  PDF  b2x2px{x),  b  =  const  We  conclude  this  section  by  constructing 
a  unitless  information  matrix  F^,  defined  as  the  mean  outer  product  of  a 
certain  vector.  Its  scalar  version  is  the  scalar  F  just  mentioned 


2.1.  Information  F  from  Unbiasedness  Property 

This  derivation  follows  analogous  steps  to  the  standard  derivation0* 
of  the  Cramer-Rao  inequality.  Start  by  assuming  one  measurement  y  that 
is  unbiased, 

0  =  <_y  — a)  =  J  dy{y-a)  p,  p  =  p{y\a)  (17) 

Differentiate  this  d/da,  giving 

J dy(y-a)^-j  dyp  =  0  (18) 

Using  normalization  of  p  and  the  usual  identity  for  the  derivative  of  a 
logarithm  gives 


\dy(y-a)p8-^-  =  l  (19) 

In  preparation  for  the  use  of  Schwarz’  inequality,  we  split  up  the  integrand 
as 


( y-a)Vp 


d\a.p 

da 


=  1 


(20) 
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(This  choice  of  splitting  is  the  departure  from  the  usual  Fisher  information 
derivation.)  The  left-hand  side  is  the  inner  product  of  the  two  bracketed 
terms.  Using  the  Schwarz  inequality  for  this  product  gives 

\dyp  ^dy(y-a)2p  (21) 

Using  normalization  for  p  and  the  definition  of  an  expectation,  we  get 

\i((y-a)2(P^j)  =  F^)  =  F  (22a) 


or 

]  (22b) 

The  latter  is  by  use  of  the  identity  for  the  derivative  of  a  logarithm.  This 
defines  the  new,  unitless  information  scalar  F.  In  general,  it  is  a  functional 
of  PDF  p.  Its  unitless  nature  is  easily  verified.  We  note  that  it  is  always  at 
least  value  unity. 

At  this  point,  it  is  not  clear  why  F  should  be  regarded  as  an  informa¬ 
tion.  It  has  not,  e.g.,  been  related  to  an  error  measure  or  a  signal/noise 
ratio.  This  will  be  remedied  in  Sections  2.2-23. 

Consider  the  case 


y  =  a  +  x,  p{y\a)  =  p{y-a) 

of  Galilean  invariance  for  the  PDF.  Equation  (22b)  becomes 

=  >1,  p  =  p{x) 

Note  that  this  may  be  expressed  in  the  alternative  forms 


(23) 

(24) 

(25) 

(26) 


Comparison  of  Eqs.  (24)  and  (10)  shows  that  the  difference  between  the 
Fisher  information  I  and  information  F  is  the  factor  x2.  This  cancels  the 
units  oc  l/x2  of  /,  making  F  unitless  as  required. 
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Derivations  (17)— (26)  tacitly  assumed  a  continuously  differentiable 
PDF  p  present.  For  a  discrete  probability  law  P{yn  \  a)  similar  results 
follow,  integrals  replaced  by  sums.  Equation  (24)  directly  becomes 


-<[ 


(y„-a)-ln />(*,!*) 


(27) 


[We  trust  that  the  reader  will  not  confuse  the  usual  notation  y„  for  a  dis¬ 
crete  random  variable  with  the  data  notation  yn  mentioned  in  Eq.  (1)  and 
elsewhere.  The  two  notations  are  never  used  together.] 


12.  Examples  of  F  for  Various  PDFs 

As  we  have  seen,  F  is  a  unitless  number  for  any  PDF.  It  is  usual  to 
attach  a  fictitious  unit  to  a  unitless  quantity,  so  as  to  identify  what  kind  of 
quantity  (in  this  case,  what  type  of  information)  it  is.  We  propose  calling 
this  unit  the  “stam,”  in  honor  of  a  pioneer  worker(8)  in  the  field  of 
parameter  estimation. 

In  Table  I,  the  values  of  F  for  some  elementary  scalar  laws  p(y  \  a), 
a  the  mean,  are  displayed.  All  other  parameters,  such  as  a  in  the  normal 
law  case,  are  arbitrary  and  fixed.  Since  the  symmetric  Cauchy  case  (with 
parameter  b)  has  an  intrinsic  a  of  zero,  the  PDF  was  shifted  to  a  finite 


Table  I.  Information  F  and  S/N  Ratio  for  Various  PDFs* 


PDF  law 

F  as  <  •> 

F  value 

S/N  =  r 

Exponential 

«y-a)4>/a4 

9  stam 

1 

Geometric 

a2(a+\)2 

10  — r2 

fa 

N(a.  a2) 

<(y-a)4>/a4 

3 

ala 

Poisson 

<(y»-fl)4>/«2 

3  +  (l/r2) 

A 

„  6  .  (N  +  r2)1 

/  Na 

Binomial 

2  \  \  ✓  i f  /  * 

a\N-a )2 

3~n+  nv 

V  N—a 

A") 

N\(y-a)4y/4a4 

3  +  (6/r2) 

vm 

Cauchy 

/  4 (y-a)4  \ 

\[b2  +  {y-a)2r/ 

3/2 

0 

*  The  mean  value  is  a  in  all  cases.  Parameter  N  is  used  in  a  generic  sense;  N — number  of  trials 
in  the  binomial  case  or  N  =  number  of  degrees  of  freedom  in  the  y2  case. 
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mean  value  a.  The  F  values  were  computed  using  definitions  (22a)  and 
(27),  giving  rise  to  the  expectations  shown  in  the  second  column.  These 
usually  involve  the  fourth  central  moment  of  the  PDF  and  are  evaluated 
in  the  third  column.  The  PDFs  in  Table  I  are  assumed  to  obey  property 
(17)  but  not  necessarily  the  Galilean  property  (23). 

Table  I  shows  that  many  PDF  laws,  such  as  the  normal  and  exponen¬ 
tial  laws,  have  an  F  value  that  is  a  pure  number,  independent  of  the 
parameter(s)  of  the  law.  For  other  laws,  information  F  depends  upon  the 
signal-to-noise  (S/N)  ratio  r  =  a/ a.  This  dependence  is  specified  in  the  last 
column  in  Table  I  in  terms  of  the  mean  a  and  other  parameters  of  the  law. 
This  dependence  upon  r  makes  sense  since  r  is  unitless  and,  by  its  defini¬ 
tion,  F  must  be  unitless. 

In  this  dependence  upon  the  S/N,  F  resembles  the  information  of 
Shannon,  which  often  depends  (logarithmically)  upon  S/N.  On  the  other 
hand,  except  for  the  extra  factor  x2  in  Eqs.  (25)  and  (26),  F  would  be  the 
Fisher  information.  Hence,  F  occupies  a  niche  somewhere  between  Fisher 
information  and  Shannon  information.  Furthermore,  like  the  Fisher  variety, 
F  is  a  measure  of  estimation  error  and  data  error.  These  are  shown  in 
Sections  2.3  and  2.5,  respectively. 

The  Fisher  information  I  for  each  PDF  in  the  table  turns  out  to  be 
value  1/cr2,  the  reciprocal  of  its  variance,  except  for  the  shifted  Cauchy  case, 
for  which  /=  l/lb2.  Note  that  all  these  Fisher  values  have  definite  units, 
i.e.,  the  reciprocal  of  the  square  of  the  units  of  y. 


23.  F  from  Fisher  /  for  Logarithmic  Transformation 

Information  F  can  also  be  shown  to  be  the  Fisher  information  for 
logarithmically  transformed  noise.  Assume  that 

y  =  a  +  x  (28) 

i.e.,  one  shift-added  data  value  (4).  Representation  (10)  for  I  is  equivalent 
to 


7=J  dxpx{x) 


(dpx(x)/dx\2 

\  PA*)  ) 


(29) 


Now  do  a  transformation  of  random  variable  from  x  to  u,  where 


u  =  exp(x/cr) 


(30) 
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?  **  and  a  is  the  standard  deviation  of  x.  Using  (30)  in  (29)  gives 

1 

Squaring  out  the  integrand  and  integrating  term  by  term,  we  find  that  the 
first  term  is  1  by  normalization,  the  cross  term  gives  —2  after  an  integra¬ 
tion  by  parts,  and  the  last  term  is  F(pv)  by  Eq.  (25).  Thus,  Eq.  (31) 
becomes 

I  =  I{px)  =~2  (F{pu)  —  1 )  (32) 

1 7 

Since  by  its  definition  (10)  7^0,  Eq.  (32)  verifies  result  (22)  that  F>  1. 
Solving  (32)  for  F  gives 

F(pv)  =  \+<T2I(pcinU)  (33) 

showing  that  information  F  is  linear  in  the  Fisher  information  I  for  the 
transformed  variable. 

Fisher  information  is  called  an  “information”  because  it  defines  the 
quality  of  the  data  as  measured  by  the  error  e2  of  estimation  given  in 
Eq.  (5).  We  may  now  verify  that  F  is  likewise  an  information.  By  Eqs.  (5) 
and  (32), 

-7,  F(Pu)>  1  (34) 

F{Pu)~  1 

Hence,  the  greater  F  is,  the  smaller  is  the  estimation  error,  verifying  that 
F  is  an  information. 


2.4.  Fas  a  Fisher  Information 

Here  we  show  that  F  for  a  PDF  p  is  equivalent  to  the  information  I 
for  an  associated  PDF  x2p.  This  will  allow  F  to  be  used  in  place  of  /  in  the 
EPI  principle  described  previously.  It  will  also  allow  conditions  to  be  found 
for  which  F  obeys  an  “F-theorem”  analogous  to  the  Boltzmann  H-theorem. 

Given  a  PDF  p(x),  define  an  associated  PDF  p(x)  =  b2x2p(x), 
b  =  const  ( b  is  needed  merely  for  normalization  purposes).  From  this  we 
see  that 


{dp/dx)2_b2 

P 


’x2(dp/dx)2 

P 


+  4p  +  4x(dp/dx) 


(35) 
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Assuming  that  p(x)  obeys  Galilean  invariance,  I  obeys  Eq.  (10).  Then  by 
Eq.  (35) 

I(p)  =  b2(F{p)  +  4  +  4^dxx(dp/dx)J  (36) 

after  deleting  the  subscript  and  using  normalization  for  p(x)  and  the  defini¬ 
tion  (25)  of  F.  Or 

F(p)  =  I(b2x2p)/b2  (37) 

after  an  integration  by  parts  of  the  far-right  integral  in  Eq.  (36),  and  use  of 
normalization,  shows  that  it  cancels  the  term  +4. 


25.  Information  Matrix  [F] 

The  definition  (22a)  of  F  is  the  scalar  case  of  a  more  general  informa¬ 
tion  matrix  [F].  Assume  that  the  data  y  are  independent,  and  with  one 
measurement  y„  for  each  parameter  value  an.  This  amounts  to  a  square 
data-parameter  problem  K=N.  As  in  the  derivation03  of  the  Fisher  infor¬ 
mation  matrix,  define  a  vector  of  length  (iV+  1), 


v  = 


y  i-°i 

(yi-Oi)d  In  pldax 
L  {yN-aN)d\vLp/daN J 


y  i  -  a\ =£i 


(38) 


Next,  form  <vvT>,  the  expectation  of  the  outer  product  of  v.  This 
becomes 


1 _ 

1 — 
0 

0 

IS 

Mn 

1 

1 

0 

1  Fmn 

1 

• 

0 

1 

1 

I*5 

III 

p 

(p)  = 

U ym-am)(y„-an ) 

*?=<«;> 


31np(y  |a)dln/>(y 


(39) 


da. 


t/>(y  1»)\ 

dan  / 


after  some  algebra  (see  the  Appendix).  The  'value  of  element  M\2  depends 
upon  the  particular  law  p(y|a).  Particular  cases  are  Af12  =  1  for  a  Poisson 
law  and  Ml2  =  2 a  for  an  exponential  law  (see  the  Appendix).  Elements  F^ 
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in  (39)  define  an  information  matrix  [F]  which  is  similar  in  form  to  the 
Fisher  information  matrix  (3).  For  our  data  scenario,  the  elements  of  F 
may  be  evaluated. 


[**]« 


Fn  1  • 

1  Fj2  1 

1  1  F3 

1  1 

1  1 


1  1 


1 


1 

Fjoc. 


(40) 


all  l’s  except  for  diag{Fmn)  (see  the  Appendix).  Each  element  F^  reverts 
to  the  scalar  information  F{p)  given  by  Eq.  (25). 

By  its  construction  as  an  outer  product,  <vvT>  is  a  positive-definite 
matrix.  Therefore  the  determinant  of  matrix  (39)  is  positive.  Expanding  by 
cofactors  along  the  top  row  of  (39)  gives 

e\  det[F]  —  M\2Cof{Fn)^0  (41) 

More  generally,  the  approach  (38)  onward  can  be  taken  with  a  top  element 
in  vector  v  which  is  y„  —  a„  —  e„.  This  permits  us  to  replace  subscripts  1  by 
n  in  Eq.  (41).  The  result  is 

elZMlFZ',  «.2  =  <sJ>  (42) 


The  data  error  goes  inversely  with  [F],  verifying  that  [F]  is  indeed  an 
information  quantity.  This  inequality  may  be  compared  with  the  Cramer- 
Rao  inequality  (2). 


3.  TRANSFORMATION  PROPERTIES  OF  F 

Information  F  has  some  important  invariance  properties,  next  derived 
and  discussed. 


3.1.  Invariance  to  Change  of  Scale 

We  continue  with  the  independent  data  case  of  the  preceding  section. 
Suppose  that  a  new  set  of  units  is  chosen  for  the  parameters  a„  and  data  y„, 
so  that 


yn=CHZn) 


&n=CnbH, 


C„  —  consts 


(43) 
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Then 


Qm{  y)  ^rn  —  y  m  Gm~Cm{Zm  bm) 


Also,  by  elementary  Jacobian  theory( 


P(y  I  a)  =  />z(z  I  *>)  det[7(z/y)] 


where 


det[/(z/y)]  =  (1  /CJ 


is  the  Jacobian  of  the  transformation  (43).  Equations  (45)  give 

d  In  p(y  1  a)  _  d  In  p^z  [  b)  db^  _  1  dinpzfzlb) 
dam  dbm  dam  Cm  dbm 

by  Eqs.  (43).  Substituting  Eqs.  (44)  and  (46)  into  the  definition  (39)  of  F„ 
gives 


FmniP)  ~  ^m)  ^'rLZn  ^ n )  q 


1  d  In  p^jt  |  b)  1  d  In  z  |  b) 


dbm  Cn  dbn 


=  ({zm-bm){zn-bn) 


d  In  p^z  |  b)  d  In  pz( z  |  b) 


=  Fmn(Pz) 


The  information  is  the  same  in  the  new  set  of  units. 

It  is  noted  that  neither  Fisher  information  I  nor  the  Shannon  entropy 


H=  -jdyp(y  |a)lnp(y  [a)  (48) 

obeys  this  property.  At  the  other  extreme,  the  Shannon  mutual  information 
S  and  the  Kullback-Leibler  entropy  (AX)  are  invariant  under  any  change 
of  coordinate  system,  even  nonlinear  ones.(10)  However,  this  might  be  an 
unnecessarily  strong  invariance  property.  From  a  measurement  standpoint, 
it  is  sufficient  to  require  invariance  only  in  transforming  from  x-space  to 
Fourier  (momentum)  /i-space,  or  vice  versa:  the  choice  of  measurement 
space  is  arbitrary  to  the  observer,  and  measurement  spaces  are  generally 
Fourier  conjugate  spaces.  A  price  paid  for  the  very  strong  invariance 
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properties  obeyed  by  informations  S  and  KL  is  apparently  that  they  cannot 
be  used  in  a  variational  principle  for  deriving  the  general  PDF  laws  of 
physics  [except  for  those  of  statistical  mechanics0 1)].  Informations  /  and  F, 
on  the  other  hand,  can  be  used  for  this  purpose,  as  discussed  below. 


3.2.  Invariance  to  Conjugate  Space 

The  information  scalar  I  was  formed,  at  Eq.  (11),  as  the  trace  of  the 
Fisher  information  matrix.  The  measurements  were  assumed  to  be  inde¬ 
pendent,  in  a  scenario  of  one  measurement  per  parameter  an ,  and  with 
Galilean  invariance  obeyed.  Continuing  with  this  measurement  scenario, 
we  likewise  form  an  associated  information  scalar  &  as  the  trace  of  the 
elements  F The  latter  are  given  in  Eqs.  (39).  The  result  is  simply  a  sum 
of  information  quantities  (26), 

&  =  4  £  J  dx  x2  )  =  ^(??,~,  q%)  (49) 

[Compare  with  Eq.  (11)  for  /.] 

As  at  Eq.  (13),  assume  N-2  measurements,  and  pack  ( qx ,  q2)  as  the 
real  and  imaginary  parts,  respectively,  of  a  complex  amplitude  function  i {/. 
Then  Eq.  (49)  directly  simplifies  to 

&r  =  4jdxx2\il/'{x)\2  (50) 

[compare  with  Eq.  (14)].  We  now  use  the  Fourier  relation  (15)  between 
\}/{x)  and  conjugate  function  to  represent  3F  in  conjugate  space. 

Differentiating  Eq.  (15)  gives 


=  ~yg^~3/2  J  (51) 

Using  this  in  Eq.  (50)  and  interchanging  orders  of  integration  gives  directly 

&  =  2~l3  |  dF  J  <¥  J  dx  (52) 


The  last  integral  is  —  2i&?5”{p!  —  fi)  where  8 (//)  is  the  Dirac  delta  function. 
Using  its  sifting  property  in  Eq.  (52)  gives 


&  =  -4 1  dn  w*(//)] 
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Integrating  by  parts  yields 

=  =  M' +  <t>\2  (54) 

after  evaluating  the  derivatives.  Squaring  out  gives 

^/4=|^^2|f|2  +  }<4z  M2+j4MW+fV)  (55) 

We  now  show  that  the  third  integral  cancels  the  second. 

An  integration  by  parts  shows  that 

j  dp<l>'(/i<f>*)  =  -j  dp  <f>(p<f>' *  +  <!>*)  (56) 

Adding  to  this  equation  its  complex  conjugate  gives 

J  dfl  M'r  +  V V)  =  -2  J  dp  w2  -  J  dy.  M<T  +  *f )  (57) 

The  far-left  and  far-right  hand  terms  are  the  same.  This  allows  us  to  solve 
for  it.  Placing  it  in  Eq.  (55)  gives  the  desired  result, 

^  =  4|4U„2|#|2  (58) 

This  equation  and  Eq.  (50)  are  of  the  same  form,  showing  that  information 
3F  is  invariant  to  transformation  between  conjugate  spaces.  [We  previously 
showed,  in  Eqs.  (14)  and  (16),  that  Fisher  information  I  does  not  have  this 
property.]  Thus,  measurements  y  made  in  either  of  the  two  spaces  give  rise 
to  the  same  form  of  information  as  required. 


4.  ESTIMATING  PDFS  USING  FMIN  PRINCIPLE 

Consider  a  scenario  where  there  are  insufficient  data  to  uniquely  deter¬ 
mine  a  PDF.  Let  the  PDF  obey  Galilean  invariance  so  that  F  obeys 
Eq.  (26).  Then  F  increases  with  the  gradient  content  of  g(x).  Hence,  if  F  is 
minimized  through  choice  of  q(x),  it  will  produce  a  q(x)  that  has  minimal 
gradient  content  Then  q{x)  is  maximally  smooth  and,  by  p  =  q2,  p(x)  will 


F -Information 


1535 


be  maximally  smooth  as  well.  A  consequence  is  that  p(x)  is  minimally 
biased  toward  particular  values  of  x.  This  is  a  desirable  property  for  an 
estimated  PDF  (see  below).  How,  then,  may  F  be  minimized? 

As  one  possibility,  F  may  be  incorporated  (see  below)  into  the  EPI 
principle*4, 7)  for  deriving  physical  PDFs.  That  approach  assumes  prior 
knowledge  of  the  physical  source  that  is  strong  enough  to  imply  an  exact 
answer  for  the  PDF. 

A  second  possibility,  and  the  one  that  concerns  us  here,  is  to  minimize 
F  subject  to  imprecise  information,  that  is,  prior  knowledge  that  is  not 
sufficient  (as  above)  to  produce  an  exact  answer  for  the  PDF.  It  can  only 
be  estimated,  as  opposed  to  the  error-free  derivation  afforded  by  EPI.  Any 
estimated  PDF  should  be  minimally  biased  toward  particular  x  values, 
since  such  bias  can  often  be  artifactual.  Hence,  a  smooth  estimate  is 
desired. 

Let  the  prior  information  about  p(x)  be  in  the  form  of  expectations 

d„=(kn(x)}=^dxk„(x)q2(x),  n  =  \,...,N  (59) 

where  k„{x)  is  a  known  kernel  function.  Knowledge  of  such  equalities  may 
be  built  into  the  extremization  approach  by  use  of  the  Lagrange  method  of 
undetermined  multipliers.*125  The  result  is  a  principle, 

+  X  A*  J  *«(*)  <l2(x)  =  roia  (6°) 

The  factor  4  in  F  has  been  absorbed  into  the  multipliers  A„.  We  call  this 
the  FMIN  principle.  Some  particular  solutions  of  interest  are  found  next. 

Consider  the  important  case  where 

^W  =  x"-'  (61) 

so  that  the  data  d„  are  moments.  Substituting  Eq.  (61)  into  principle  (60) 
and  using  the  Euler-Lagrange  solution  gives  a  formal  solution, 


d  (dse\ 

dx\dq'J 

-ef= o 

dq 

(62a) 

■2,=*j(j?Y+  i  A„*-y 

\ClX/  n- 1 

(62b) 

We  carry  through  some  particular  solutions. 
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Suppose  that  N=  1.  This  represents  the  weakest  level  of  constraint 
information:  a  normalization  constraint  The  minimum  obtained  in  F  will 
then  be  the  absolute  one.  Substituting  Eq.  (62b)  into  (62a)  gives  a  differen¬ 
tial  equation, 


x2qn  +  2 xq'  —  Xq  =  0,  X  =  Xt 


The  solution  that  allows  for  normalization  is(13) 


g(x)  =  Ax~*,  p(x)  =  A2x-2* 
a  ^  1/2,  *0<*^*i»  a,x0,  Xi=const 


(63) 


(64) 


This  is  a  monotonic,  and  therefore  smooth,  law  as  was  required.  Evaluating 
A  by  the  normalization  requirement  and  substituting  the  resulting  p(x)  into 
Eq.  (25)  gives  F=4a2  (independent  of  x0  and  xx).  Since,  by  (64),  the  mini¬ 
mum  possible  value  of  a  is  5,  this  means  that  the  minimum  possible  value 
of  F  is  unity  [verifying  Eq.  (22)].  We  may  note  that  this  result  holds  even 
in  the  limit  as  x0  ->  0  and  xx  -*  00,  i.e.,  as  the  curve  q(x)  is  allowed  to 
extend  over  the  entire  positive  real  line. 

That  the  answer  for  p{x)  with  oLm  =  |is  a  simple  \/x  law  is  of  interest. 
Such  a  PDF  is  the  only  one  that  is  invariant  to  a  change  of  units/ 14)  This 
might  have  been  expected  since  F  is  itself  unitless,  and  the  constraint  of 
normalization  is  the  weakest  one  possible.  The  l/x  law  has  previously  been 
found(15)  to  describe  the  occurrence  of  unrelated  numbers  (weights,  lengths, 
times,  etc.)  binned  in  a  common  histogram.  Since  the  fundamental  physical 
constants  are,  by  definition,  the  most  unrelated  numbers  of  all,  they  might 
likewise  be  expected  to  follow  the  l/x  law.  In  fact,  this  is  the  case/7, 14) 

A  l/x  law  has  been  suggested06*  to  be  the  appropriate  prior  probabil¬ 
ity  law  for  defining  a  random  variable  about  which  nothing  is  known 
except  normalization  and  its  positivity.  Steps  (60)-(64)  amount  to  a  deriva¬ 
tion  of  this  hypothesis.  To  our  knowledge,  (60)  is  the  only  variational 
principle  that  naturally  (under  minimal  constraints)  gives  rise  to  a  l/x  law. 

The  l/x*  result  followed  from  FMIN  in  the  weakest  possible  constraint 
case.  This  implies  that,  with  stronger  constraints,  FMIN  solutions  will  tend 
toward  l/x*.  This  is  verified  next. 

Consider  the  case  N=  3,  corresponding  to  constraints  on  normaliza¬ 
tion,  the  first  moment  a,  and  the  variance  a2.  In  place  of  Eq.  (63)  we  now 
have  the  differential  equation 


x2^"  +  2 xq'  —  Xx  q  —  X2xq  —  X?x2q  =  0 


(65) 


i 


i 

i 


j 

: 

i 
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The  normalized  solution  that  also  obeys  the  three  constraints  is  the  y-  (or 
chi-square  )-law 


p(x)  =  Axfie~rx,  x^O 

A  =  gYV/V),  =  (66) 

y-r2/^  r  =  a/a 

[Note:  This  assumes  that  a^O.  If  instead  a^O,  then  the  domain  is 
and  x  is  replaced  by  ( —x)  on  the  right-hand  side.]  The  y-law  answer  is  a 
unimodal,  smooth  function,  as  required.  For  comparison,  the  answer  p(x) 
given  by  maximum  entropy  H  or  by  EPI  under  the  same  constraint  infor¬ 
mation  is  the  normal  law.(4)  This  is  a  smooth  law  as  well.  Which  solution 
to  choose  is  arbitrary,  until  other  considerations  are  made.  If,  for  example, 
the  user  is  convinced  that  the  three  constraints  are  sufficient  to  define  the 
PDF,  then  he  should  choose  the  EPI  approach. 

Solution  (66)  is  a  l/x“-type  law  (64)  modified  by  an  exponential  falloff. 
This  shows  that  the  bias  toward  1/x*  spoken  of  above  is  a  real,  persistent 
effect  Also,  the  dependence  upon  the  given  parameters  a ,  a  is  only  through 
a  and  the  signal/noise  ratio  r.  Once  again,  signal/noise  is  a  natural  specifier 
of  F- theory. 


5.  USE  OF  &  IN  EPI  PRINCIPLE 


The  key  information  quantity  in  the  EPI  principle  is  the  Fisher  trace 
information  I  defined  in  Eq.  (11).  We  show,  next,  that  /  may  be  replaced 
by  a  particular  choice  of  the  F-trace  information  Comparing  the  q-form 
expressions  (11)  for  I  and  (49)  for  8F,  and  using  identities  (26)  and  (37), 
shows  that 


/=  /(??,...,  q%)  =  b2^(q2Jb2x2t...,  q2Jb2x2)  (67) 

By  the  use  of  this  identity,  EPI  may  be  reexpressed  in  terms  of  the  new 
information  & .  This  is  beneficial  because  of  the  favorable  transformation 
properties  that  3F  was  previously  found  to  have.  The  result  is  a  revised  EPI 
principle  based  upon  an  information  that  is  independent  of  the  choice  of 
measurement  space  and  of  the  units  employed. 
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6.  BOLTZMANN  F-THEOREM 

It  was  recently  found(17)  that,  for  temporally  dependent  PDFs  p(x  \  t), 
the  Fisher  information  obeys  an  “I-theorem,” 

dl/dt^  0,  /=/(/?)  (68) 

This  indicates  an  increase  in  disorder  (as  measured  by  /)  with  time  and  is 
analogous  to  the  Boltzmann  H-theorem.  The  proviso  to  (68)  is  that  p  obey 
the  Fokker-Planck  differential  equation.  Then  the  correspondence  (37) 
between  1  and  F  allows  us  to  state  an  “F-theorem,” 

dF/dt^O,  F=F(p)  (69) 

provided  that  the  associated  PDF  p  =  x2p  obeys  the  Fokker-Planck  equa¬ 
tion.  Result  (69)  indicates  that  F  is  a  physical  measure  of  disorder. 


7.  SUMMARY 

A  unitless  variant  F  on  Fisher  information  was  found  to  arise  out  of 
an  approach  ( 17)— (22)  analogous  to  the  usual  derivation  of  the  Cramer- 
Rao  inequality  and  Fisher  information.  This  gives  the  scalar  (single  datum) 
version  of  F.  Correspondingly,  an  F-information  matrix  (39)  or  (40)  arises 
out  of  a  multiple-data  scenario.  This  follows  an  approach  (38)-(39) 
analogous  to  the  usual  derivation  of  the  Fisher  information  matrix. 
Another  result  of  interest  is  the  error  inequality  (42),  analogous  to  the 
Cramer-Rao  inequality. 

Information  F  is  computed  for  various  PDFs,  as  given  in  Table  I.  The 
F  value  for  one-parameter  laws  are  often  pure  numbers  (e.g.,  F=  3  stam  for 
a  normal  law).  For  multiple-parameter  laws,  F  depends  upon  the  S/N  ratio 
(a  unitless  quantity). 

Information  F  is  related  to  the  Fisher  information  /  in  certain  direct 
ways.  For  example,  F  is  linearly  related  to  I  [Eq.  (33)]  when  the  added 
noise  x  in  Eq.  (28)  is  expressed  as  the  logarithm  of  an  associated  random 
variable.  Also,  F  for  a  PDF  p  equals  /  for  an  associated  PDF  proportional 
to  x2p  [Eq.  (37)]. 

The  information  matrix  [F]  is  found  [Eqs.  (43)— (47)]  to  be  invariant 
to  a  change  of  scale. 

Information  F  may  be  used  in  a  variational  principle  (60),  called 
FMIN,  for  estimating  an  unknown  PDF  in  the  presence  of  insufficient 
information.  The  PDF  should  obey  Galilean  invariance.  The  solution  to 
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FMIN  under  only  the  normalization  constraint  is  a  PDF  (64)  of  the  form 
\/x 2*  The  minimum  obtained  for  F  attains  its  absolute  minimum  value, 
of  1,  for  the  case  a  =  1/2.  The  resulting  \/x  law  is  known  to  be  invariant  to 
a  linear  change  of  scale,  corroborating  a  property  previously  found  for  F. 
Also  this  PDF  has  been  proposed  in  the  past  to  define  the  appropriate 
prior  probability  for  a  number  whose  only  known  property  is  that  of 
positivity.(16)  Finally,  this  law  defines  the  histogram  of  unrelated  quantities, 
such  as  masses,  lengths,  times,  etc.,  and  consequently  describes  the 
histogram  of  the  fundamental  physical  constants/7, 14) 

The  solution  to  FMIN  under  constraints  of  normalization  and  the  first 
two  moments  is  a  gamma-  or  chi-square  law  (66).  This  modifies  the  l/x2* 
behavior  previously  found  with  an  exponential  dropoff. 

Information  F  obeys  a  Boltzmann  F-theorem  (69)  provided  that  the 
associated  PDF  x2p  obeys  the  Fokker-Planck  equation.  Then  F  is  a 
measure  of  physical  disorder,  like  entropy  H. 

Just  as  the  trace  of  the  Fisher  information  matrix  is  taken  to  define  the 
scalar  information  /,  the  trace  of  [F]  may  be  taken  to  define  a  scalar  infor¬ 
mation  SF  Eq.  (49)].  This  information  has  the  important  property  that  its 
representation  in  Fourier  (e.g.,  momentum)  space  is  the  same  as  its 
representation  in  direct  (e.g.,  position)  space  [compare  Eqs.  (50)  and  (58)]. 
Finally,  because  of  Eq.  (37),  EPI  may  be  expressed  in  terms  of  the  informa¬ 
tion  scalar  3F  and,  hence,  in  terms  of  an  information  that  is  of  the  same 
form  regardless  of  the  (arbitrary)  choice  of  measurement  space. 


APPENDIX:  CALCULATION  OF  SOME  MATRIX  ELEMENTS 

The  aim  is  to  compute  the  elements  of  <vv>T,  with  vector  v  given  by 
Eq.  (38).  This  is  in  the  presence  of  independent  and  unbiased  data, 

p(yl*)=  FI  p(yn\<*n)  (Ala) 

71—1 

<yB>=0„  (Alb) 

Since  vector  v  is  of  dimension  (Ax  1)  the  outer  product  matrix  <wT>  is  of 
dimension  NxN. 

By  sight,  the  (1, 1)  element  is  < ( jVj  —  «i)2>  =  <«?>  =e\. 

The  (1,  2)  element  is  called  MX2 •  It  is  obvious  that  element  M2X  =  MX2. 
Directly  from  (38), 


Mi2=((yi-aiY 


d  In  p(y  |  a)N 
dax  t 


,  ,2  d\np{yx  |a,)\ 

— *r, — / 


(A2) 
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by  independence  (Al).  We  may  now  drop  subscript  1.  The  answer  for  Mn 
depends  specifically  upon  the  form  of  the  law  p(y  |a).  For  a  Poisson  law 
p{y  |  a)  =  exp(  —a)  ay/y\,  we  have  d  In  p/da  =  (y  —  a)Ja  so  that,  by  (A2), 


Mnm 


<(y-a)3> 


(A3) 


[The  third  central  moment  of  the  Poisson  law  is  the  mean.(18)]  Or,  for  an 
exponential  law  p(y\a)  =  a~l  exp(  -  y/a),  y^O,  we  have  d  In  p/da  = 
(y  —  aVo2;  then  by  (A2) 


«y^>=2a 

a 


(A4) 


The  general  form  for  elements  of  [F]  as  indicated  in  (39)  directly 
results  from  the  outer  product  of  vector  v  defined  in  Eq.  (38).  we  now  verify 
that,  under  the  independence  conditions  (Ala),  (Alb)  the  elements  of  [F] 
are  as  given  in  (40).  First  consider  the  off-diagonal  elements  F^^m^n. 
Expanding  out  the  derivative  of  a  logarithm  and  using  independence  effect 
(Ala),  definition  (39)  collapses  into  a  product  of  one-dimensional  integrals, 

=  FmF„  F. =  f  dy.  (y.  -  a.) 

I/Ujl 

Pn  =  P(y*\an)  (A5) 


We  can  express 


(A6) 


By  the  unbiasedness  property  (Alb)  the  first  integral  is  an,  and  by  nor¬ 
malization  the  second  integral  is  1.  Therefore,  F„=  1,  so  that  by  (A5) 
Fmn=  1.  This  verifies  the  off-diagonal  elements  in  (40). 

Going  through  the  same  procedure  for  the  case  m  =  n,  the  multiple 
integral-mean  in  (39)  collapses  into  a  one-dimensional  integral, 


=  J  dyn  p(yn  |  an)(yH  -  an)2 


d  In  p(yw  |  oj 
dan 


(A7) 


This  is  the  same  as  the  scalar  answer  (22a)  and,  in  case  (23)  of  Galilean 
invariance  for  pnt  readily  goes  over  into  the  form  (25). 


F -Information 


i>u 


-9  4*  * 


REFERENCES 

1.  H.  L.  Van  Trees,  Detection,  Estimation,  and  Modulation  Theory,  Part  /  (Wiley,  New  York, 
1968). 

1  B.  R.  Frieden  and  R.  J.  Hughes,  Phys.  Rev.  E  49,  2644-2649  (1994). 

3.  B.  R.  Frieden,  in  Advances  in  Imaging  and  Electron  Physics,  Vol.  90,  P.  W.  Hawkes,  ed. 
(Academic,  Orlando,  FL,  1995),  pp.  123-204. 

4.  B.  R.  Frieden  and  B.  H.  Soffer,  Phys.  Rev.  E  52,  2274-2286  (1995). 

5.  B.  R.  Frieden  and  W.  J.  Cocke,  Phys.  Rev.  E  54,  257-260  (1996). 

6.  W.  J.  Cocke,  Phys.  Fluids  A  8,  1609-1614  (1996). 

7.  B.  R.  Frieden,  Physics  from  Fisher  Information  (Cambridge  University  Press,  Cambridge, 
1998). 

8.  A.  J.  Stam,  Inf.  Control  2,  101-112  (1959). 

9.  B.  R.  Frieden,  Probability,  Statistical  Optics  and  Data  Testing,  2nd  edn.  (Springer,  New 
York,  1 99 1 ) 

10.  F.  Reza,  An  Introduction  to  Information  Theory  (McGraw-Hill,  New  York,  1961). 

11.  Ref.  4,  Appendices  A  and  C. 

12.  G.  A.  Kom  and  T.  M.  Kom,  Mathematical  Handbook,  2nd  edn.  (McGraw-Hill,  New 
York,  1968). 

13.  E.  Kamke,  Differentialgleichungen  Losungsmethoden  und  Losungen  (Chelsea,  New  York, 
i948 ). 

14.  B.  R.  Frieden,  Found.  Phys.  16,  883-906  (1986). 

15.  F.  Benford,  Proc.  Am.  Philos.  Soc.  78,  551-572  (1938). 

16.  H.  H.  Jeffreys,  Theory  of  Probability,  3rd  edn.  (Oxford  University  Press,  London,  1961). 

17.  A.  R.  Plastino  and  A.  Plastino,  Phys.  Rev.  E  54,  4423-4426  (1996). 

18.  Ref.  9,  p.  50. 


\ 

I 


Printed  by  Catherine  Press,  Ltd,  Tempelhof  41,  B-8000  Brugge,  Belgium 


REPORT  DOCUMENTATION  PAGE 


AFRL-SR-BL-TR-OO- 


Public  Renorting  burden  for  thii  collection  of  information  ii  oonuted  to  average  1  hour  per  response,  including  die  6,^ .... 

gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comment  regarding  this  burden  estimates  o,  -  .  jecuon 

of^rrauioo.  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  information  Operations  andReports,  1215  Jefferson  Davis  Highway. 
VA  •mm^vn  and  m  ibe  Office  of  Manateroent  and  Budget.  Paperwork  Reduction  Project  (0704-0188,)  Washington.  DC  20503. _ 

2.  REPORT  DATE  I  3.  REPORT  TYPE  AND  DATES  COVERED 


Suite  1204,  Arli 


1.  AGENCY  USE  ONLY  (  Leave  Blank) 


4/12/00 


Final  Report  -  9/10/98  -  10/31/99 


5.  FUNDING  NUMBERS 

F49620-98-1 -0228 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES)  10.  SPONSORING  /  MONITORING 

AGENCY  REPORT  NUMBER 

Airforce  Office  of  Scientific  Research/NE 
110  Duncan  Ave.,  Room  B1 15 
Bolling  Airforce  Base 
Washington,  DC  20332-8080 

11.  SUPPLEMENTARY  NOTES 

The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  be  construed 
as  an  official  Department  of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 

12  a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT  12  b.  DISTRIBUTION  CODE 

Approved  for  public  release;  distribution  unlimited. 

13.  ABSTRACT  (Maximum  200  words) 

A  new  approach  for  digitally  reducing  the  presence  of  random  atmospheric  turbulence  in  imagery  was 
developed.  This  'image  division'  method  is  based  upon  the  use  of  two  short-exposure  images  as  data.  These 
have  unknown  point  spread  functions  (PSFs)  as  defined  by  the  random  turbulence.  The  latter  are  found  by 
dividing  the  two  image  spectra  (thus  'image  division'),  producing  a  set  of  linear  equations  which  may  be 
inverted  for  the  PSFs.  The  inversion  problem  is  inherently  ill-posed.  However,  the  solution  is  stabilized  by 
imposing  prior  knowledge  about  the  object  and  two  PSFs  :  that  (a)  they  have  finite  support  extensions  and  (b) 
they  numerically  obey  positivity.  Once  the  two  PSFs  are  found,  they  are  used  to  inverse-filter  their 
corresponding  images.  The  result  is  two  output  reconstructions  of  the  object,  which  are  simply  averaged  to 
produce  the  final  output.  The  approach  was  successfully  applied  to  both  simulations  and  to  real  infrared 
imagery  from  Kitt  Peak  National  Observatory. 


4.  TITLE  AND  SUBTITLE 

Topics  in  Unconventional  Imagery 


6.  AUTHOR(S) 

B.  Roy  Frieden,  Principal  Investigator 

Sergio  Barraza-Felix,  Doctoral  Graduate  Student  _ ' 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Optical  Sciences  Center 
University  of  Arizona 

P.O.  Box  210094  -  Tucson,  Arizona  85721 _ 


14.  SUBJECT  TERMS 

Satellite  imagery;  missile  imagery;  imagery  through  turbulence;  blind 
deconvolution;  image  division  method;  wind  shear  detection 


15.  NUMBER  OF  PAGES 

41 


16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION 

OR  REPORT  _ 

UNCLASSIFIED 


18.  SECURITY  CLASSIFICATION 
ON  THIS  PAGE 
UNCLASSIFIED 


19.  SECURITY  CLASSIFICATION 

OF  ABSTRACT _ 

UNCLASSIFIED 


20.  LIMITATION  OF  ABSTRACT 


MEMORANDUM  OF  TRANSMITTAL 


Airforce  Office  of  Scientific  Research/NE 
1 10  Duncan  Ave. ,  Room  B1 15 
Bolling  Airforce  Base 
Washington,  DC  20332-8080 

□  Reprint  (Orig  +  2  copies)  □  Technical  Report  (Orig  +  2  copies) 

□  Manuscript  (1  copy)  [X]  Final  Progress  Report  (Orig  +  2  copies) 

□  Related  Materials,  Abstracts,  Theses  (1  copy) 

CONTRACT/GRANT  NUMBER:  F49620-98- 1-0228 

REPORT  TITLE:  Final  Report  -  Topics  in  Unconventional  Imagery - _ 


is  forwarded  for  your  information. 

SUBMITTED  FOR  PUBLICATION  TO  (applicable  only  if  report  is  manuscript): 


Sincerely, 


B.  Roy  Frieden,  Professor 
The  University  of  Arizona 
1630  E.  University  Blvd. 
Tucson,  Arizona  85721 


Enclosure  4 


